Hidden-Markov Program Algebra 
with iteration 



Annabelle Mclver* Larissa Meinickej Carroll Morgan- 



t 



Jan 2011 



Abstract 



We use Hidden Markov Models to motivate a quantitative composi- 
tional semantics for noninterference-based security with iteration, includ- 
ing a refinement- or "implements" relation that compares two programs 
with respect to their information leakage; and we propose a program al- 
gebra for source-level reasoning about such programs, in particular as a 
means of establishing that an "implementation" program leaks no more 
than its "specification" program. 

This joins two themes: we extend our earlier work, having iteration 
but only qualitative [37], by making it quantitative; and we extend our 
earlier quantitative work [57] by including iteration. 

We advocate stepwise refinement and source-level program algebra — 
both as conceptual reasoning tools and as targets for automated assis- 
tance. A selection of algebraic laws is given to support this view in the 
case of quantitative noninterference; and it is demonstrated on a simple 
iterated password-guessing attack. 

1 Introduction: extant theory and practices 

Hidden Markov Models, or HMM^s, extend Markov Processes by supposing that 
the process state is not directly visible: only certain observations of it can 
be made [55]. How iJMM's motivate a quantitative noninterference-security 
program semantics is our principal topic: the hidden state of the HMM has 
"high security" and the observations that the HMM allows are "low security." 

Program algebra is the manipulation of program texts themselves, i.e. as 
syntax and according to algebraic rules laid down beforehand, with the aim of 
showing equivalence or ordering with respect to a so-called "refinement" relation 



between one program and another. That requires a semantics, and proofs 
of the elementary rules wrt. that semantics. Furthermore these rules must be 
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preserved by context in order for true algebra to be possible: in programming 
semantics, that last is called compositionality. This represents an "up front" 
cost for reasoning about program behaviours. When that cost has been paid 
however, just once, then the benefits accrue forever after — every time an 
equality or refinement can be shown syntactically without "descending" into 
the semantics. 

The significance of iteration is that its proper treatment, via suprema of 
chains, makes interesting demands on the semantic machinery already set-up 
for straight-line, quantitative noninterference programs pi I27j. 



Our first specific contribution extends an existing (but recent [57]) composi- 
tional semantics for straight-line quantitative noninterference security, one with 
a novel two-level "hyperdistribution" semantics, by showing how hypers (for 
short) -previously introduced without detailed motivation- are in fact directly 
suggested by the mathematical machinery of HMM^s (Sj3|. Our second con- 
tribution adds iterating programs to that {^^, requiring thus a treatment of 
nontermination and fixed-points: this would be straightforward were it not for 
the fact that supremum-completeness, on which fixed-points' existence usually 
relies, does not appear to hold. 

Our third contribution (Q is to show how, in spite of the incompleteness, we 
can via a more-specialised "termination order" retain discrete distributions for 
the treatment of loops: that gives a simpler theory than (the more general) mea- 
sures would require. Nevertheless, our further goal of extending compositional- 
closure [22] to iteration does seem to require measures: at that point, there is 
no escape ({ 12 ). 

Our final contribution is a selection of algebraic laws (^, and the treatment 
of an example (^ 10 1 illustrating the style of reasoning we hope they will facilitate. 



2 Program algebra and refinement 

Algebra is powerful, and it is general; and it is especially useful in program 
verification where algebra's feature of compositionality allows the reuse that 
simplifies verification tasks. Program algebra in particular provides equalities 
or refinements (see below) that, although proved in isolation between program 
fragments, can then be reused freely within arbitrary contexts, drastically sim- 
plifying correct-by-construction and/or post-hoc verification arguments. 

A refinement ordering between programs is weaker than equality: it defines 
the relationship that must hold between specifications and their implementa- 
tions in a given application domain [33 ESI H] • In special applications, such 
as noninterference security, the refinement relation is adjusted -usually made 
more restrictive- to take further aspects into account: here it will be the pos- 
sible release of high-level information. Thus secure refinement checks not only 
(non)termination, but also compares programs to see which one releases more 
information about hidden, high-security variables: it is more distinguishing than 
standard program refinement. 
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For example, take integer variables v, h and suppose that v is visible (low- 
security) whereas h is hidden (high-security). Furthermore, assume an attacker 
with "perfect recall," i.e. one who remembers visible variables' values even if 



they are subsequently overwritten. (We explain this assumption in j|4.2|and { 12 



below.) Then we would expect the refinement (v:= h-h2; v:= v-^2) C v:=h-^-4 
but, crucially, not the reverse. On the left, observing the first assignment to v 
as well as the second (perfect recall) allows us to distinguish h=l from h=3; but 
on the right we cannot do that. The right-hand program is a refinement of the 
left-hand one because it is more secure; with an appropriate security-refinement 
algebra we would show this syntactically (^9.3). 



3 Hidden Markov models and hyper-distributions 

3.1 Basic structure of HMM's 

A Hidden Markov Model comprises a set X of states, a set y of observations, 
and two stochastic matrices T, E [55] : the transition probabilities T give for 
any two states x^^q ijdX the (conditional) probability r(a;i|a;o) that a transition 
will end in final state xi given that it began in initial state xq; and the emission 
probabilities give for any state xq and observation yi£y the probability E{yi\xQ) 
that yi will be emitted, and thus observed, given the initial state xq. Typically 
an HMM is analysed over a number of steps i — 0, 1, • • • from some initial 
distribution Xq over X , so that a succession of states xi, X2, • • • and observations 
2/i> 2/2, • • • occurs, where each Xi related to Xi+i by T and to yi+i by E. 

We assume finitely many states in the state space, and thus use discrete 
distributions throughout. 

We illustrate a single step in Fig. [T] With the distribution of incoming 
state xo, the distribution Xi of outgoing states xi is the multiplication of Xq as 
a row-vector by T as a matrix, that is Vt[Xi=xi) := Pr(Xo=a;o)T(a;i|xo). 
Similarly the distribution Yi of observations yi is given by a (matrix) multi- 
plication amounting to Pr(Yi=yi) := Pr(Xo=a;o)-E'(2/i|a^o)- The "hidden" 
essence of the model is that though we cannot see the incoming xq's and the 
outgoing xi's directly, still the observation of yi tells us something about each 
if we do know the incoming state distribution Xq and the matrices T, E. 



3.2 A priori and a posteriori distributions 
on the state-space X 

The a priori distribution on the HMM^a input is Xq, and the a posteriori 
distribution on the input can be calculated from T, E, and Xq in the usual way, 
via Bayes' formula, once we have observed yi. 

But we concentrate instead on the output. The a priori distribution of 
outgoing xi is Xi as calculated in |3.1| above. Its a posteriori distribution is 

^The countably infinite supports mentioned in the introduction occur, eventually, in spite 
of this finiteness assumption (36. Ik. 
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distribution on observations 



incoming-state distribution 



rc " 




outgoing-state distribution 



By a priori we mean that the distribution Xi is determined statically, from information 
"aheady" available and in particular is not derived from an actual execution. 

Figure 1: A Hidden Markov model, a priori view 



conditioned on the emitted yi actually observed: it too is determined by the 
usual Bayes formula 



Pr(Xi=a;i|ri=yi) 



VT{X(,^xo)E{yi\xQ)T{x^\xo) 
Pr(Xo=Xo)£;(t/i|a;o) 



(1) 



that is the (joint) probability that xi,yi both occurred divided by the overall 
(marginal) probability that yi occurred. Thus before we observe any yi we 
believe the distribution of outgoing xi to be Xi, and after we observe t/i we 
believe that distribution to be as ([T]) . This view is illustrated in Fig. [2j 

3.3 The attacker's point of view: 
an equivalent representation 

Although the matrices T, E determine the HMM completely, we suggest that 
from the point of view of an attacker trying to determine the state of the HMM, 
it would be more useful to consider a different (but equivalent) formulation: the 
effect of one step from a known initial distribution Xq is a joint distribution over 
observations in y and their corresponding outgoing conditional distributions 
over X: this structure thus comprises values A of type D{yx'DX), where we 
write DX and similar for the type of discrete distributions over A", thus one- 
summing functions of type A"— 7'[0, 1]. That is, each A gives for a pair (yi, i5i) in 
yx3X the probability that an attacker will observe yi and will conclude from 
it that xi has a posteriori distribution ^i. 

We call such A-values hyperdistributions, or just hypers. Since A is a joint 
distribution (jointly over y and DA"), we can speak of its left- and right-marginal 

distributions: the left-marginal distribution A is of type Dy, and is in fact just 
Yi from above. That is, the distribution Yi of emitted observations is recovered 
as A. 
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emitlecl-value distribution V, 



incoming-stale distribution 




The pair (y^, Sj) occurs 
with probability p- - V^-Y] 
so that X, = Xi Pi 6] 



corresponding to 
corresponding to yj 
corresponding to 



outgoing-state 
conditional distributions 



By a posteriori we mean that the conditional distributions 5{i,2,3} are deduced after 
observation of the emitted values i/{i,2,3}, and represent a revision of the a priori 
knowledge of the outgoing state as represented in the Xi of Fig. [l] 

Figure 2: A Hidden Markov Model, a posteriori view 



The right-marginal distribution A of the hyper is more interesting: it is of 
type D^A" and, although it averages to the outgoing state distribution Xi (in the 
sense shown in Fig. [2| , most of the popular (conditional) information-entropy 
measurements are likely to decrease, becoming less than the entropy of Xi itself: 
that decrease quantifyies the "leak" that the emissions of Yi represent. For 
example, the conditional Shannon Entropy of A, defined J^s-nx ^('^)-H(<5) over 
the possible a posteriori distributions 5 is no more than H(Xi), the Shannon 
Entropy of the a priori outgoing distribution Xi itself.j^ 

Thus the denotional-style semantic representation we extract from HMM- 
theory is the hyperdistrihution of type ©(J^xDA"), a nesting of one distribution 
within another. As we will see, this allows us to equip the semantic space with 
a "refinement" partial order; but it is security refinement, so that for hypers 
A{o,i} one can speak of whether Aq is more- or less secure than Ai or, if not, 
whether they are perhaps simply security-incomparable. 



3.4 A probabilistic monad 

A further benefit of conventional denotational techniques is our access to com- 
putational monads [TSl |32J ST] , simplifying the presentation considerably. 

From here on, we use a dot "." for function application, rather than parenthe- 
ses (•), writing thus f.x rather than f{x). For Curried functions we will usually 
have f.x.y rather than e.g. either f{x,y) or /(x)(2/).[^ As a result, given distri- 

^For distribution X in DA' such that PT{X=Xi) is pi, the Shannon Entropy H{X) of X is 
given by — E^p; Inp^. As remarked, a number of other security-based definitions of entropy 
give the same inequality 1241 . 

•^An advantage of this is that it distinguishes function application from the many other uses 
of parentheses, and produces self-contained expressions thus of less clutter. In this respect we 
compare H.Jf = —T,x X.xln{X.x) with the conventional presentation of Shannon Entropy in 
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bution X-.'DX the probability it assigns to x: X is simply X.x, that is Pr(X=a;) 
but written more compactly and taking advantage of the fact that X is just a 
function (of type <¥— >-[0, 1]). The same economy accrues for random variables. 

The monad structure for computations |32| supposes a triple (K, r/, /x) where 
K is an endofunctor in a given category, and rj, are natural transformations 
satisfying certain coherence conditions. An example of this is the Giry monad 
|15| . typically used for probabilistic computations; in its general form, its functor 
takes an object (fi, Bq) comprising a set Vt and a sigma-algebra So on it to the 
set of probability measures on (n,Kn), endowed with a suitable sigma-algebra 
of its own, induced from the given Bq,. 

Working here with discrete measures, our use of the monad will be modest 
and we will use suggestive names for its components, based on its specialisation 
to discrete distributions and functional programming. In particular, 

functor D — Given set SS write OSS for the set of discrete distributions over 
SS. 

push-forvifard map — Given two sets X ,y and a function f:X—>y write D/, 
the action of the functor on the function, as map. /: DA'-^D3^.|f] In the 
probability literature this is called the the push-forward, defined for any 
X:DX and y.y in the discrete case as 

map./.Xy := X.{r\y) = {j:x:X\ f.x^y • X.x ) . 

multiplication avg — The multiplication (natural) transformation fi: D'^X—^DX 
averages the distributions in its argument distribution-of-distributions, to 
give a distribution again. We write that as avg for "average" and in the 
discrete case for X: D'^X it is defined for any x:Xa.s 

avg.X.x := ( ^XiDA" • X.XxX.x ) . 

Kleisli composition via lifting For two functions f:X^'Dy and g-.y-^^Z, 

the lift of g, written g* , is defined to be the functional composition 
avg o ma p. 5 of type D3^— J'DZ so that the Kleisli composition g after / 
is expressed g*of in the usual way. 

Similar definitions and notations apply for the monad D associated with the 
partial distributions that sum to no more than one I 2H 147) . 

The immediate benefit from this monadic structure is the sequential compo- 
sition of two HMM^s, written say Hi;H2, so that the final a posteriori distribu- 
tion takes the observations from the first as well as the second HMM observables 
into account. If we take the type of -^{1,2} to be yxDX -> D(3^xDA'), then 
we want our definition of Hi;H2 to have similar type, thus giving it the same 
features as a single HMM provided that the observations from both of its com- 
ponents are considered: the general case is illustrated in Fig. [3] 

Footnote j2] with its indices i and temporary names pi. 

*This IS consistent with the definition of map in functional programming. 
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Figure 3: A Hidden Markov Model: iteration accumulates leakage 



For our single composition Hi]H2, the outgoing result of Hi is an a posteriori 
hyper which is then presented "en bloc" as input to H2. Via the type construc- 
tor, that intermediate hyper is a partitioning of some flattened distribution of 
type yxHX according to the observables emitted by each partition of 

that flattened distribution -itself of type yxIiX- is separately input to but 
after the final output is produced the partitioning is "reassembled" in the over- 
all the final a posteriori distribution, thus neatly taking both the observations 
from H^i 2} into account. Crucially this partitioning and reassembling is done 
according to the original weightings, which is what allows us to use the monad: 

yy.nx ^ n{yxnx) ""^^ D^{yxBX) ^ D{yxDX) . 

Here the original input type yxDX is transformed as we suggest by Hi to 
3{yxDX), which then in its partitioned form is passed to H2 and, via the 
map/avg construction, that partition is reassembled after the action of H2 on its 
components. That supplies our definition for iJi; i?2, thus also of type yxOX 
I}{yxDX). Note wc do not need y^ to "combine" the observations of the two 
separate HMM^s, an important advantage of this presentation: in [4.2 2]4) this 
is explained further. 



^We call it a distribution simply to avoid a proliferation of names. In fact it is isomorphi- 
cally a distribution of type I}{yxX) whose left marginal is a point distribution. 
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4 Quantitative noninterference security 
for programs 



4.1 Noninterference via hidden- and visible variables; 
atomicity 

Take a simple programming model comprising a finite set % of hidden states, 
ranged over by (high-security) program variable (s) h, and a finite set V of visible 
states, ranged over by (low-security) program variable(s) v. The state space 
overall is thus the product VxH, and our program texts refer to var iables v,h0 

Observers of the program's execution can see v, but they cannot see h. At- 
tackers of the program try to learn about h's final values, or at least their 
distribution, by observing v's values as execution of the program proceeds. 

We begin with assignment statements as a basis: a simultaneous assignment 
is written v, h:= V, allowing both expressions V, H to refer to the initial values 
of V, h without worrying about which one is updated first: the two expressions 
V, H may contain variables of either kind, or both. We base the assignment on 
a probabilistic-choice syntax x:e X that means "choose the new value of x ac- 
cording to the distribution X" where X itself is a distribution- valued expression 
that possibly depends on the initial value of x as well as other variables. 

To keep track of variables in the generic semantic definitions, we write the 
distribution- valued expressions as explicit functions applied to them, so arriving 
at V, h:e £'.v.h,T.v.h as our basic simultaneous probabilistic assignment to the 
two variables; actual program texts of course simply use expressions over v, h in 
which such functions might occur. Thus Em.\\ is a distribution, depending on 
the initial values of v, h, according to which v's new value is chosen. Similarly 
T.v.h is the distribution for the choice of h's new value. The statement is atomic 
in the sense that only its results are accessible, not how they were computed. 

Ordinary, non-probabilistic assignments can be written v, h:= iS.v.h, T.v.h 
in which E, T now give values rather than distributions; they are clearly the 
special case of the above for point distributions, and so do not need a separate 
treatment. 

4.2 Connecting HMM^s and the programming model; 
perfect recall 

The connection is made by identifying V with y, and % with X . The visible 
V corresponds to the emitted observations y, thus to a sort of "output buffer"; 
and the hidden h corresponds to the state x, passed from one program fragment 
through sequential composition to the next one. 

^For simplicity we are assuming that multiple hidden- or visible variables are collected 
separately within vectors h or v, i.e. that v, h are the only variables present; but we won't 
clutter the presentation with the "overhooks" for that. The program texts can refer of course 
to individual elements of the vectors, which references are interpreted as projections etc. in 
the usual way. 
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A slight generalisation is that the probabilistic choices can be influenced by 
the immediately previous emitted value (by y from the previous step, whose 
value is the initial value of v for this step), whereas in an HMM this is typically 
not done. This is only a notational convenience, since clearly the HMM's state 
spaces can be elaborated to allow the same freedom; but such conveniences are 
a part of adapting the HMM framework to programming practice. 

Further elaborating the adaptation, we make the following remarks: 

1. A typical sequential program will execute many individual atomic steps 
successively. The outgoing state from one step will be both its final value 
of h, fed-in automatically as the incoming state of the next step, and the 
last emitted observable value, found in v. 

2. Although the observations emitted from each step will successively over- 
write earlier values in v, the conditioning observations of those earlier 
outputs caused is not lost: it is preserved by the map/avg composition. 
Thus the partitioning expressed by the growing support-set of the outer 
D becomes finer on each step, so that deductions made by an attacker's 
having seen an earlier v are never forgotten. This is called Perfect Recall 

m- 

3. The distribution E.v.U from which v's final value is chosen corresponds to 
the stochastic matrix E of the HMM. In effect, the h in E.v.h selects the 
row of E that gives the distribution from which y, that is from which v 
is chosen. Similarly, the distribution T.v.h from which h's final value is 
chosen corresponds to the stochastic matrix T. The programs' access to 
V is why we include (3^x) in the state. 

4. The a priori view of the program is the extent to which we can determine 
the distribution of the final values of h by knowing the incoming distribu- 
tion of V, h and the program text. The a posteriori view reflects the extra 
information about h finally that we have once we actually execute the 
program and note the successive emissions in v that occur during that ex- 
ecution. However the values of those emissions need not be remembered: 
only the conditioning they induce is important. That is why we do not 
need "sequence of 3^" in our state, in spite of perfect recall: the recall is 
expressed in the outer D. 

5 Refinement increases entropy compositionally 

Our advocacy of stepwise refinement [IS] for development of quantitatively non- 
interference secure programs suggests comparisons of specifications S with im- 
plementations /. Say that S is "Shannon-refined" by /, writing S ^sc I, just 
when for every incoming distribution of hidden values h the a posteriori condi- 
tional Shannon Entropy produced by /, for h, is at least that produced by S (as 
atp] above). Stepwise refinement wrt Shannon Entropy requires transitivity of 
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(^se) obviously; but it also would require that S ^se I imply C{S) ^se C{I) for 
any context C — that is, it should be compositional. And it is not, in general. 

Define analogously (^br) for comparing conditional Bayes Risk of outputs, 
in the same way; it is not compositional either. 

The refinement relation (C) introduced earlier [27j, and extended here for 
iteration, in fact is compositional; and furthermore, it implies both (^se) and 
(^br)- (Counter-examples for compositionality of (^se) and (^br) are given in 
the in the extended version of that work.) We now explain refinement. 

5.1 Comparing hyperdistributions 

We begin for simplicity with an entirely hidden state X (i.e. without y) thus 
having hypers D'^X. We ask whether, for (each) fixed observation yi, one HMM 
"reveals more" than the other in a sense made precise as follows. 

A hyper Ag in D^A", produced say as the output of one HMM, is "refined 
by" another hyper A/, produced by another HMM of the same type, if two 
distributions : DA" in the support of A5 can be merged to form a single 

distribution 61 in what becomes A/. This merging increases a variety of (con- 
ditional) entropies, including the two mentioned above. We say that one HMM 
is entropy-refined by another when for corresponding inputs and corresponding 
values of emitted observables their outgoing hypers are refinement-related. 

For an example, we restrict to Booleans T, F, and write ^x®^, y®^. • • • , z®^^ 
for the discrete distribution assigning probabilities p,q, ■ ■ ■ , r to values x,y, - ■ ■ , z: 
it is partial or total depending on whether p+q+ ■ ■ ■ +r equals 1. Suppose 
the specification hyper A5 contains two distributions (^gi^ -S^T® 3 , F® 3 1. and 
'^1-= "ST®^, F®'^ with probabilities p^:—l/A and p^:=l/3 respectively: thus 

@ — @ — 

As is partial and can itself be written ,Sg ^ , • • •S'- We first calculate a 

weighted merge as follows: 

• Scale Sg^''^^ by their respective probabilities p^^^^l in A5 to get partial 
distributions |T®A,F®bJ and |T®5,F®5;^. 

• Add those together pointwise to get -S^T®!, F®^^-. 

• Normalise to get Sj:— \T®^ , F®^^- with probability p:~ 7/12 in A/. 

Then we refine A5 by removing the two distributions 6^ ' (total weight 7/12) 
and replacing them by their weighted merge, the single 5i (of the same weight), 
to give A/. All the other points in the support of A5 would carry over unchanged 
into A/; but of course this process can be repeated, since refinement is to be 
transitive. We see at ([2| below that general entropy refinements are achieved by 
merges of more than two sources, having multiple targets and by "pre-splitting" 
sources proportionally to allow them to participate in more than one merge: the 
essential idea is as given here. 

^The Bayes Risk is the largest guaranteed chance that one guess of h is incorrect. 
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5.2 Preliminary defintion of entropy refinement 

Distributions over our states in X, i.e. in DA", are called inner distributions 
or just "inners." Distributions over inners, i.e. in W&X — D^A", are hypers, 
as we have seen. If we want to concentrate on the "outer" D of a hyper, we 
refer to that as the outer distribution, or just "outer." We will (briefly) need 
distributions of hypers D^A", called super distributions or just "supers." 

Definition 5.1 Entropy refinement (preliminary definition) Let the state- 
space be A, a finite set, and consider two hypers Aj^ /jiD^A". We say that A5 
is entropy refined by A/, written Ag^A/, iff there is a super AiD'^A such that 

A5 — avg.A and map.avg.A = A/ . ^ 

We return to our example, hyper A5 now with three inners ^5:= -^^T^s , F®5 J 
and (5|:= §T®5, F®?;^ and 5%:= |T®i} with probabilities p^:^ 1/4 and ^2;= 1/3 
and j)^:= 5/12 respectively, where the third inner is chosen to bring the (outer's) 
sum to 1, i.e. to make it total. 

Now to reach the entropy refinement of A5 given by hyper A/, we merge 
the first two inners and simply carry the third through. The mediating super 
A contains the two hypers 

• hyper ^^-.^ I5l®\5l®^ with probability 7/12 in A, and 

• hyper A^:= -g^cJl^^J- with probability 5/12 in A, 
so that A5 = avg.A, for example because 

avg.A. (5^ = 3/7x7/12 = 1/4 = Pi = Ag.^ • 
From Def. |5.1| the hyper A/ is therefore given by map.avg.A, that is 

• inner distribution avg.A^ = avg.{{(5i®^ J = |T®f ,F®7;^ 
with probability 7/12, and 

• inner distribution avg.A'^ ~ BMg.^Sg^^^ — Sg itself, carried through 
with probability 5/12 as we expected. 

A second example of entropy refinement is given at ([2| below. 

It can be shown [2 7) that refinement is indeed a partial order: reflexivity is 
obvious, anti-symmetry follows from an entropy-based argument or alternatively 
from "colour mixing" [JS]. Its transitivity can be shown using matrices, or 
by a monadic approach using general properties of map and avg and specific 
properties of the probabilistic functor. 
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6 Iteration, refinement chains 
and incompleteness 

Iteration and entropy refinement taken together impose new demands on our 
semantic space, closure under limits: iterations are usually defined via least 
fixedpoints whose existence is trivial if the program space forms a cpo under 
the refinement ordering [35] . Since we have not yet introduced non-termination, 
the space (D^A", ^) has no least element, and so it is not a cpo. But it is not a 
dcpo either, as we show below: not all of its non-empty chains have a supremum. 

As a result, the usual technique of defining iterations via refinement-least 
fixedpoints will not obviously apply -even after extending the space and its 
ordering to incorporate non-terminating behaviours- and we will have to do 
something slightly different (j |7.3[ ). 

6.1 An example of incompleteness 

Define again A':={T, F}, and let 6p be the inner ^T®^ ,F®^~p^, alternatively 
written Tp© F, for any 0<p<l. Form the sequence of hypers 



Ai 

A2 

A3 



^0 



Sfisfh (2) 



-'o '"1/2 

s®i s®i f 

->() ''^l/4'"l/2'"3/4'"l 



in 'D'^X whose pattern should be evident. 

From Def. |5.1| we see that each of these hypers is an entropy refinement of 
the preceding: for example to get from A2 to A3 we first "pre-split" A2 into 
smaller pieces 

t"0 '"0 ' "l/2' "l/2' "l/2' "1 '"1 J 
merge merge 

and then merge the selected inners as explained above, that is 

"0 +"1/2 — "1/4 ^^"^ "1/2 +"1 — "3/4 

to give A3 when we allow the un-merged distributions simply to carry through.]^ 
Now by symmetry any hyper A that was a refinement-limit of the chain 
([2| would have to be uniform (except for the endpoints) but with a countably 
infinite support, since the supports of the chains' elements grow without bound 
— and uniform, infinite and discrete distributions do not exist. The actual limit 
of that refinement chain is in fact the measure over the distributions Sp given 
by taking p uniformly from [0, 1], and that is outside our space D^A". Writing M 
for "measure" (informally, i.e. without being specific about the sigma- algebra) 
we find our limit in MDX rather than D'^X. 
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6.2 Dealing with the incompleteness: proper measures 

strategy suggested above would lead us through steps like 



Pursuing the 
these: 

1. Define a metric over DA", i.e. provide a distance function between (dis- 



crete) distributions. For reasons we explain in { 12 we would choose the 
Kantorovich metric |13j which is advocated for this kind of application 
anyway jTf] . 

2. Generate the Borel algebra from the Kantorovich metric. 

3. Define refinement between hypers that are proper measures, a generalisa- 



tion of the "split /merge" of Def. 5.1 and explained in our previous work 
[27j for the discrete case. 

4. Observe that the resulting, more general semantic space DA' — MDA" 
allows (still) a monadic treatment of sequential composition. 

5. Define the program semantics *|8]in that more sophisticated space. 

But we do not do that here. Instead, in this report we limit our exten- 
sions to just what will suffice for the quantitative security of iterative programs, 
including making refinement-based comparisons between them, as part of our 
general programme of expanding the scope of this approach to deal with re- 
alistic situations. In fact, we will see that refinement chains generated by the 
fixed-point definition of loops do have a refinement-sup in our space — that is, 
"loop-approximant chains" are a strict subset of all possible refinement chains, 
and do not in particular contain examples like ^ above. 

Thus we can remain within the space of discrete hypers D^H, which al- 
lows a drastic simplification (compared with MDH) of the presentation. How 
this is done is the topic of the next section: we will be using partial dis- 
crete distributions to represent nontcrmination, thus concentrating on a space 
VkDH — D(VxD'H) for denotations of programs. 



"Using our formal definition Def. |5.l| introduces a super A to mediate the entropy 
refinement A2::<A3; it is given by 

normafise tlie columns 
to give hypers of A 



[ 


"l/2 


avg 


"o 


®1 
"l/2 


<S>i 
°l/2 


5®* 

°l/2 
















m i 


5i« 



map. avg 



1/4 



1/2 "3/4 



The two merges of inners referred to in the main text occur in columns 2,4; the columns 1,3,5 
are the inners that carry through unchanged from A2 to A3. 
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7 Semantics for the iifiVfiVf-interpretation 
of secure iterating programs 



7.1 Denotations of programs 

Here we give a precise construction of a semantic space, and the interpretation of 
a small programming language in it. The language includes probability, visible 
vs. hidden variables, and iteration. 

For noninterference we imagine a finite underlying state space of two parts, 
named V and T-l where V is the "visible" part of the state and H is its "hidden" 
part. Because the H part is hidden our underlying state space will not be simply 
the Cartesian product of those two components, but rather the set SS:=VxDH 
comprising the product of the visible part V (as is) and the distributions DH 
over the hidden part. 

For nontermination we consider program outputs to be of type DSS, that 
is DCVxDH), the partial distributions over SS — this represents a slight gen- 
eralisation of the type suggested above for programs in that the partiality (the 
one-deficit) is used to describe the probability of the program's failing to ter- 
minate [m [201 [Ml HE] ■ As before, we call elements of DS^* hypers, referring if 
necessary to partial hypers when the distinction is important. Thus SS—^DSS, 
that is VxDH — > D{VxDH) is the type we propose for programs: from an ini- 
tial state (w,(5) in SS a program determines a partial distribution SS, i.e. a 
distribution whose supports have structure {v',S'), as its final output. 

Recall from §3.4| that we write function application as f.x, with "." as- 
sociating to the left. Operators without their operands are written between 
parentheses, as (^) for example. Fl 



7.2 The Entropy Refinement Order between programs 

As usual our orders on programs will be the pointwise orders on their results. 
Our first order is the Entropy Refinement set out at Def. |5.1[ adapted to deal 
with partial hypers and to take the V portion of the state into account. 



Definition 7.1 Entropy refinement (generalising Def. 5.1) Let the state- 
space be SS ^ VxBn, with V,n both finite, and define Q: SS^I]i{Vx'H) with 
Q.{v,S).{v' , h') equal to 5.h' if v—v', otherwise zero.F^ 



^The latter (known as sections in functional programming) allows us easily to write ex- 
pressions relating operators themselves, such as the succinct (<) C (<) stating that less-than 
is a subset of less-than-or-equals as a relation. Thus the former "dot" convention distinguishes 
function application from sections as well. 

As a further example (though not needed in this report), as part of the definition of the 
Giry monad one defines evaluation functions Eg that, given a measure fi as argument, return 
11 applied to the measurable set B as the result. With sections and the "dot" convention one 
writes directly (.S) for this function: the well established syntactical rules for sections then 
ensure that Eg(fi) = {.B).fi = fi.B automatically. A separate introduction, definition and 
explanation of the Eg notation is not necessary. 

^''More succinctly, this is defining the product distribution Q.(v, S):= ^v^xS. 
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For two hypers Aj^jjiDS'S', we say that A5 is entropy refined by A/, writ- 
ing As^A/, just when map.Q.As < map.Q.A/ according to our prehminary 
definition Dcf. |5.1| of entropy refinement, but taking our VxT-l, here, all at once 
as just X there and generalising map, avg to partial distributions in the obvious 
way. □ 

Like the preliminary definition, (^) defines a partial order on hyper-distributions. 
Note that a consequence of this definition is that entropy refinement does not 
change the distribution of the visible variables: that is if As^A/, then in fact 
A5=A/ where we recall that A is the left-marginal distribution of the product 
distribution A:P(VxD7^). Similarly, the a priori distribution of h associated 
with each value of v is left unchanged. 

We now address the incompleteness issue raised in §6.1[ 



7.3 The Termination Refinement Order between programs 

We follow an approach that allows us to distinguish between chains produced 
by iteration and those produced by refinement more generally |41L I42j : we use 
a stronger order for which our space is complete. 

For a partial hyper A: DSS^ the probability that it terminates is just its total 
weight, written X)^! equivalently, the amount by which it fails to sum to 1 is 
its probability of nontermination. We define a partial order that allows increase 
of termination only, as follows: 

Definition 7.2 Termination Refinement For A{5 /} in DSS, we say that A5 
is termination-refined by A/, written A5'<A/, just when for all s = {v,d) in 
SS we have A5.S < A/.s. This is simply the pointwise extension of (<) on the 
real- valued probabilities. □ 

Our space is trivially closed under sup-chains in this termination order, since 
the probabilities themselves are bounded above (by 1). We will in due course 
show that the fixed-point definition of iteration generates termination chains, 
and so the completeness here will give us just the well definedness we need. 
That is, we will rely on 

Lemma 7.1 Termination completeness Let Ao<Ai<--- be a (<)-chain of 
hypers in DSS. Then the chain has a (<)-least upper bound \/- Ai in DSS. 
Proof: Completeness of [0, 1]. □ 

Note that everywhere-terminating programs are maximal in this cpo. 



7.4 Secure refinement between programs 

The primary order of interest on our space, secure refinement, allows both en- 
tropy refinement and termination refinement: 
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Definition 7.3 Secure Refinement Given (partial) liypers A^g /j in DS'S', 
we define Secure Refinement as th.e composition of the two otlier orders: first 
termination refinement, and then entropy refinement. We have AgCA/ just 
when there is an intermediate hyper A such that A5<A and A^A/. □ 

Observe triviaUy that (<) is a strengthening of (C), by refiexivity of {<). Like 
termination and entropy refinement, secure refinement it is a partial order on 
hyper-distributions. Refiexivity holds trivially from that of (<) and {<). The 
transitivity of (C) follows from transitivity of the two other orders, plus the fact 
that (C) D (^)o(<). For antisymmetry we reason that \i A \— C and C ^ A 
then there must exist a B and D such that A < A+B ^ C and C < C+D ^ A. 
From refiexivity of (<) and transitivity of (C) we then have that both A+B C A 
and C+D C C, and thus both C and D must be zero since (C) cannot decrease 
the overall weight of a hyper-distribution. From this we have that A^C and 
CdiA, hence A=C by antisymmetry of {■<). 

The definition of program refinement is the pointwise extension of the above, 
that is 

Definition 7.4 Secure Program Refinement Let S, I be programs' meanings 
of type SS — )• DSS. We say that S'C/ just when for all initial states s: SS we 
have S.s C I.s according to Def. 7.3 □ 



7.5 Least fixed points in SS^BSS: 
getting around incompleteness 

The normal approach to fixed-point semantics for loops would be to show that 
a loop defines a (C)-continuous functional C over the program space SS^DSS, 
and then to take the (C)-supremum of the chain n C £.11 C n • • • where n 
is the least program, the one producing the output hyper of zero weight for all 
inputs. 

Here instead we show that a loop defines a (<)-continuous functional C, 
and then take the (<)-supremum of the chain LI < C.YL < LI • • • . Its well 



definedness follows from Lem. 7.1 its relevance is justified by the following 
lemma. 

Lemma 7.2 Equivalence of fixed points Let partial orders (<) and (C) be 
defined over some space X, and let £ be an endofunction on X . Suppose further 
that (<) C (C), that is that (<) implies (C). 

If a (<)-least (resp. greatest) fixed point of C exists, then also a (!II)-least 
(resp. greatest) fixed point of C exists, and in fact they are equal. 
Proof: Let x be the (<)-least fixed-point of C. Then for any (other) fixed- 
point x' of C we have x<x' and so -by assumption- also x\Zx' . Thus a; is a 
(IZ)-lower-bound for all fixed points; but it is a fixed point itself. Therefore it is 
the (C)-least fixed point as well. (The same argument holds for greatest.) □ 

In the next section we will introduce our language to express and reason 
about secure programming, extending our previous work with iteration. We 
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use Def. 7.2 for the semantics for loops, relying on Lem. 7.2 to ensure that 
it is also well-defined as a least fixed point in the security order; we do, of 
course, need to show that the assumption of (<)-continuity is satisfied by the 
semantic definitions we give. In the conclusion we shall return to the question 
of (C)-limits more generally, i.e. those which are not restricted to (<)-limits. 



8 Programming language 

Having tied-down the details of our semantic space, we can now give our pro- 
grams' denotations via structural induction; however there are two potential 
sources of complexity in what we present. The first, conceptual, is the two- 
level structure that we motivated in the sections above, the partial distributions 
that themselves are taken over other conditional, or sometimes even a posteriori 
distributions. 

The second is notational: standard constructions like conditionals and push- 
forward are now generated by program fragments that, as a rule, are expres- 
sions over free variables (i.e. the variables of the program) rather than (pure) 
mathematical functions themselves. This leads to uncomfortable expositions 
like "Pr(_D|iJ) where distribution D{x) is given by ■ ■ ■ x ■ ■ ■ and predicate E{x) 
holds just when • • • x • • • " . Although these are easy to understand (being well es- 
tablished notations), they are hard to manipulate algebraically in specific cases 
where £>, E are determined by some computer program. 

We now introduce specialised notation to streamline our semantic definitions. 



8.1 Distribution comprehensions 

Recall that the support \d~\ of distribution (5: DA" is those elements x: X with 
S.x^O; naturally for 5: DA" we have [i5]CA'. The weight of 6 is written ^5, 
defined 6.X so that full distributions have weight 1. Distributions can be scaled 
and summed according to the usual pointwise extension of multiplication and 
addition to real- valued functions, provided the outcomes are again distributions. 

Given a non-empty finite set X we write IX\ for the uniform distribution 
over X, that is the uniform distribution 6:DX such that [(5]=A'. 



8.1.1 Enumerated distributions and expected values 

These are notations for enumerated distributions, i.e. those in which the support 
is explicitly listed (cf. set enumerations that list a set's elements): 

— empty The empty, or zero subdistribution has empty support and assigns 

probability zero to all elements: wc write it -^^J-. 

— multiple We write ^x®p, y®'^, • • • , z®''^- for the distribution assigning proba- 

bilities p,q, ■ ■ ■ , r to elements x.y, - ■ ■ , z respectively, with p+q+ ■ ■ ■ +r < 
1. Provided p,q, ■ ■ ■ , r > 0, the support is therefore the set {x, y, ■ ■ ■ , z}. 
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— point The distribution concentrated on a single element x is written \x^^ 

i.e. abbreviating ■S^x®^^' whose support is {x}. 

— uniform When exphcit probabihties are omitted they are implicitly uniform: 

thus {{x, y, zl is Ix® 5 , y® 5 , z®^. 

— binary, and distributed uniform For a two-element distribution we write 

Xp(By for ^x®P , y^^~^^, and in the uniform case we can write x(By(B ■ ■ ■ (Bz 
for lx,y,--- ,z}. 

For expected values of random variables that are written as expressions, we have 

— expected value We write {Qd:d • E) for the expected value ^^-^ (S.dxE) 

of expression interpreted as a random variable in d, over distribution 
S. 

If E is Boolean, then it is taken to be 1 if i? holds and otherwise, so 
that the expected value is then just the combined probability in S of all 
elements d satisfying E. If necessary for clarity we will write [E] to indicate 
E's conversion from Boolean to 0, 1; when possible, however, we omit it 
(to reduce proliferation of brackets). 



8.1.2 Distribution comprehensions, conditioning 
and a posteriori values 

As for set comprehensions, with distribution comprehensions we describe a dis- 
tribution by giving a rule for forming it, i.e. its supporting elements and the 
probabilities they have. Here are the common cases: 



— map, push- forward When / in { 3.4 is given as an expression E of type y, 

with free variable x say, then for the push-forward distribution map.f.S 
we write the comprehension ^x: S • E^ where for y: y we define 

ix:S-Ej.y := {Qx:5-E=y). 

Recall from above that the Boolean value E=y is to be converted implicitly 
to 0, 1 in this case. 

— conditional distribution Given a distribution 6: DA" and a Boolean expres- 

sion R in free variable a;, we write ^x: S \ R\ for the distribution obtained 
by conditioning S on the set (the event) that R represents as a predicate 
in X. Thus for x': X we have 

ix:S\ Rj.x' :== S.x'x[R'] / {Qx:S • [R]) , (3) 

where R' is R with x replaced by x' and here, for clarity, with [•] we make 
the conversions to 0, 1 explicit. 

— a posteriori values Finally, for Bayesian belief revision suppose (5 is an a 

priori distribution over some X and let expression R (not Boolean) in free 
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variable a: in D be the probability of a certain observable result if that 
X is chosen. Then \^x: 5 \ is the a posteriori distribution (revising 6) 
when that result actually occurs. The definition is as for ([3| immediately 
above, but using just R rather than [R]. 

Note that R can be scaled without affecting the value of this expression, 
so wlog it can be made one-summing as x varies: this makes it easier to 
interpret as a probabilistic outcome that triggers Bayesian belief revision. 

— general distribution comprehension Wc can combine all the above pos- 
sibilities by writing -^^a;: 5 \ R ' E^, for distribution 5, real expression R 
(in x) and expression E (also in x) to mean 

{C^x:5 • Ry^lEj) / (C^x:5 • R) (4) 

where, first, an expected value is formed in the numerator by scaling 
and adding point-distribution ^E^ as a real-valued function: this gives 
another (sub-)distribution. The scalar denominator then conditions on R. 

A missing E is implicitly x itself. If R is omitted, then (i?x) is removed 
from the numerator, and the denominator is removed altogether. (When 
(5 is a full distribution, this happens automatically by assuming a missing 
i? to be 1, or equivalently Boolean true.) 

As a concrete example we recall the puzzle 

In families with two children of equally and independently distributed 
gender, if one child is a boy what is the chance that the other is too? 

Encoding boy, girl as Booleans T, F we write ^^x, ?/: T©F | x\/y • x/\y^ for the 
distribution of the pushed-forward function both boys (xAy) over the iid gender 
joint- distribution of the two children (x,y:T(Bf) conditioned on the event at 
least one boy (x\/y). It works out as 

lx,y:T®f I xWy • xAyJ 
= (0x,y:T®F • [xyy]xlxAy}) / {Qx,y:T®F • [xVy]) 
= {lxin/A + lxiF}/4+lxiF}/4 + 0xlF}/A} / (3/4) 
= {{T®i,F®i:&/(3/4) 
= Ti/3®F, 

that is (as we know) that the a posteriori probability of "both boys" is 1/3. 
This is the kind of calculation that specific programs' semantics generate. 



8.2 Program semantics for the HMM core: revelations 



We recall from { 3.1 that an HMM is determined by two stochastic kernels (ma- 
trices) T, E. In programming terms the T represents a probabilistic assignment 
to our hidden variable h; we deal with that at Choose prob. hidden in j ]8.3| below. 

The E on the other hand releases information (about h) in what we call 
a "revelation" — observables our attacker can see [29]. It has two forms, the 
second a generalisation of the first. 
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In the two definitions below, and further, we write E.v.h as an expression in 
which program variables v, h might occur free. The same convention applies to 
D,G,p for distributions, (Boolean) guards and probabilities resp. 

Program type Program text P Semantics |P].(u, ^) 

Reveal value reveal i?.v.h ^h: S • {v,^h':6 \ E.v.h'—E.v.hJ)J 

Expression E.v.h takes its value in some type X representing observations 
an attacker can make. The command reveals a value x depending on v, h. 
Neither v nor h is changed by this; but the outgoing distribution of the 
hidden h is conditioned on the basis of the x revealed. Note that x is not 
stored; but because of perfect recall an attacker can remember it. 

Reveal choice reveal D.v.h ^h: 6;x: D.v.h • {v,^h:S \ D.v.h.xJ)^ 

Expression D.v.h is now more generally of type DA", so that for x: X we have 
D.v.h probability. The command calculates that distribution, and 

then chooses some value x according to those probablities; that value x is 
then revealed. As before, variables v, h are not changed; but the distribution 
of h is conditioned on the fact that x was revealed. 
Reveal value is the special case reveal -^^iJ.v.hJ of Reveal choice. 



8.3 Semantics of syntactically atomic commands 

Syntactically atomic commands are regarded as semantically atomic in the sense 
that the only information they leak is what the final value of the visible v allows 
to be deduced about the final value of h with knowledge of the program text. 
Thus for example v:= h leaks everything about h, since v's final value is evidently 
the same as h's; yet v:=Oxh reveals nothing, even though at some point in an 
internal register the value of h might have been accessible. In this sense the 
syntactic atoms are the atoms of observation also: within them neither perfect 
recall nor implicit flow make sense. 

We determine the semantics of these atomic commands systematically. Using 
"classical," i.e. without-noninterference probabilistic sequential semantics [251 
etc.] gives a straightforward meaning to atomic commands' actions on a state 
space SS as functions of type SS^DSS taking an initial distribution to a (sub- 
)distribution of final states. If we abstract from noninterference properties by 
considering v to be hidden (as well as h), and set SS:^VxT-L then we have a 
ready-made classical semantics for the syntactic atoms we are dealing with here. 

The initial "state" will be a pair {v,6) in VxDH. We therefore reuse "Q" 



from Def. 7.1 to express this as the joint distribution -^^wj-xj of type ©(VxH), 



that is DiS'S'. To apply a command with semantics of type SS—^'DSS to that. 



we use lifting (^3.4 1 so that the result of this classical interpretation is again 
of type D(Vx?^), and we convert this back to the noninterference output-type 
D(VxDH) by analogy with "revealing v" according to the semantics above — 
since knowledge of v's final value is all that escapes an atomic command. Fol- 



20 



lowing Reveal value from above, we define 

rv.A {{{v,h):A. {v,Uv',h'y.A)\v^v'mE (5) 



The result of the procedure above -convert incoming VxDH to D{VxH), 
then apply lifted classical semantics; then apply rv to the result- is summarised 
below. Observe that neither program abort, nor assertions are necessarily useful 
for writing specific programs, but our focus is on reasoning about programs, in 
particular algebraically, and for that these commands play a prominent role. 

Program typtf^ Program text P Semantics |P ].(?;, 5) 

Least element abort 

This is the program that simply fails to terminate: for every input it pro- 
duces the empty subdistribution as output. In our refinement order, as a 
specification it allows all possible implementations (i.e. that abort C S for 
all S) — essentially playing the role of "0" in arithmetic. 

Identity skip 'S(^^i<5)S" 

The "do nothing" command simply converts its input to a point-hyper on 
output, i.e. reproduces its input with probability one. 

Assertion {p.v.h} ^{v, fh: S \ p.v.h^) @{Qh':5 > p.v.h')^ 

An assertion gives directly in p.v.h the a probability of the command's 
termination. With probability 1—p the assertion behaves as abort. 
When with probability p it does terminate, however, it conditions the hidden 
value's distribution S on the fact it did so: that is S is revised to reflect that 
the abort did not occur. The visible variable v is unaffected in this case. 

Assign to visible v:= E.v.h fhiS • {E.v.h, fh': 6 \ E.v.h' =E.v.h}) ^ 

The command's effect is to assign the r/is-value to v hut also to condition 
the hidden distribution on the fact that h can produce the value observed 
to have been put into v. 

Assign to hidden h:—E.\/.h \ {v , \h: 5 ' E .v .K^) 

The command does not change v, but maps the hidden incoming distribution 
of h through E.m considered as a function of (incoming) h to produce the 
resulting distribution on (outgoing) h. 

Choose pr oh. visible v:GD.v.h 

{{ v': {Oh: 6' D.v.h) - (v', fh': S \ D.v.h'.v'J) } 
Expression D.v.U is a distribution on V, and the choice of v's new value is 
made according to it. It generalises Assign to visible, since the latter can 
be written v:€ ^E.v.h^. 

Choose proh. hidden h:E D.v.h ^ {v,{Qh:S • D.v.h)) J- 

^^We justify |5| informally by noting that it's what results from replacing hidden h in the 
rhs of Reveal value by the hidden pair (v, h) and considering the expression E.v.h to be simply 

V. 
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Expression D.v.h is now a distribution on and the choice of h's new value 
is made according to it. It generalises Assign to hidden, since the latter can 
be written h:e {^E.s/.hJ. 



As a syntactic convenience, when we are using the more general "choose" 
form of either command but the r/is's distribution is written out using (©) 
rather than as a -§^§-style comprehension, we use the conventional assignment 
symbol (:= ) so that e.g. we can write v:= T©F for flipping a fair Boolean coin. 

As an example of the algebraic utility of Assertion, we note that distin- 
guished commands abort and skip are special cases of assertions, so that 
skip — {T} and abort = {F}. Further, the semantics of Reveal choice can 
be given more compactly -assuming D.v.h has type DA"- as 

[reveal i:>.v.h].(u, (5) = { J2x: X • l{D.v.h.x}].{v,5) ) . (6) 

That formulation makes it easy to reason about revelations in terms of more 
primitive commands. We also have that assignments to visible variables that 
may depend on h may be represented more simply in terms of those that do 
not: 

Iv:ei:>.v.hl.(t.,^) = ( ^f':V • |{i:'.v.h.v'};v:=w'] ) . (7) 

As we will see in the next section, assertions also play an important role in the 
specification of probabilistic choice and conditionals. 

8.4 Semantics of compound commands: implicit flow 

Compound commands are in fact the simplest to define, since they are treated 
almost as they would be for classical semantics. The only adjustment is to insert 
conditioning assertions on program branch-points to enforce implicit flow, that 
is that information escapes by observation of the outcome of conditionals. 

Program type Program text P Semantics ^) 

Composition Pi;P2 lP2j* .{lPij.iv,S)) 

Sequential composition is interpreted as Kleisli composition ( ^3.4[ ). 

General prob. choice Pl pv Pn. 

[{p.v.h}; PlUv, S) + [{1-p.v.h}; Pj^Uv, S) 
Expression p.v.h is evaluated to a probability of the command's taking its 
left branch; otherwise it takes the right. The attacker can observe which 
branch was taken: this is reflected in the conditioning assertions at the 
beginning of each branch. 

Conditional choice if G.v.h then Pt else Pp fl 

^■^The most general form of atomic assignment is the Simultaneous choice mentioned earlier, 
whose semantics can be deduced as for the others from its classical behaviour. Since it is 
seldom needed, however, we omit its definition for brevity. 
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{{G.vM-PtUv, 5) + [{-G.v.h}; Ppl{v, S) 
This is a specialisation of the previous General probabilistic choice to the 
case where the probabiUty is always either 1 (go left) or (go right). Again 
the conditioning assertions guard each branch. 

Iteration while p.v.h do P od the (<)-least fixed point of £, 

applied to {v,5), where £ is the unique endofunction on the space SS^BSS 
of programs' meanings such that for any program L we have [P;Lj,vh® 
skipl=£.lL]. 

As for Conditional choice, the loop guard is a probability determined by 
the program variables v, h, with as a special case Booleans T, F interpreted 
as 1 (enter the loop) or (terminate the loop). 



For iteration we are taking the usual least-fixed-point approach except, for 
the reasons explained above, we use a special termination order (<) for the 
chain of iterates. For this we need the (usual) technical results of continuity of 
our program contexts. 

Lemma 8.1 Continuity of program, contexts Any context C(-), constructed in 
the programming language above, satisfies C{\I^Pi) = \l ^C{Pi) for non-empty 
(<)-chains Vi Pi- 
Proof: Because the termination order is so simple (unlike the entropy order), 
being essentially pointwise less-than-or-equals, this result follows easily from 
linearity of the Kleisli-composition (essentially lifting) used in the definition of 
sequential composition. □ 

Importantly, each our compound operators are monotonic with respect to 
their arguments and the secure refinement order (C), meaning that we may 
reason compositionally about the correctness of programs. 

Theorem 8.1 Monotonicity of compound commands Each of the commands 
listed above are monotonic with respect to their program arguments and the 
refinement order. □ 

8.5 Local- and multiple variables; hidden correlations 

To this point we have had just two variables, visible v and hidden h, and have 
been assuming for simplicity that they are all the variables in the program. In 
practice however each of V, 'H will each comprise many variables, represented 
in the usual Cartesian way. Thus if we have variables a: A,b: B,c:C,<i:'D with 
the first two a, b visible and the last two c, d hidden, then V is AxB and is 
CxV so that the state-space is ^xBxD(Cxr'). Assignments and projections 
are handled as normal. 
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Thus we allow local variables, both visible and hidden, which extend the 
state as described above: within the scope of a visible local-variable declaration 
||[ vis x: A" • • • ]||, the Viocai used is A'xVgiobai- Hidden variables are similar 

Note however that because for simplicity we have been assuming that v, h 
are in fact all the variables in the program, i.e. that they stand for vectors 
of variables implicitly, our semantics above establishes the equality of the two 
fragments v:=h; v, h:=0,0 and v, h:=0,0, reflecting our deliberate concentra- 
tion on h's final value [36l|3l] in order to extend conventional refinement [33115] 
that does the same. In this case h's initial value's being revealed on the left 
has no bearing on our knowledge or ignorance of its final value and so does not 
introduce a difference in meaning between the two fragments shown. 

If however there are other hidden variables, not mentioned but still in scope 
as might happen within a local block or within the context of extra declarations, 
then our semantics must be slightly more general, in particular recognising that 
the V or h appearing on the left of an assignment is just one component of a 
vector of visible resp. hidden variables. 

Technically this is handled by extending our hidden distribution to type 
DH^, which tracks correlations with initial values. For simplicity we do not 
do that here, since in fact any program in which hiddens are not assigned-to 
(as in our examples and case studies) can be treated with the simpler DH-style 
semantics. 



9 Algebra of HMM-style programs 

The programming language introduced in SjSj interpreted over the hyper-based 
semantics, admits a program algebra allowing the proof of general refinements 
between programs. In this section we present some of the foundational laws of 
this program algebra, which are then illustrated in |9.8| and §10[ via an example 
based on password guessing. 



9.1 General principles and scoping laws; 
referential transparency 

As for classical programs, it is possible to replace expressions by other expres- 
sions of equal value in context so that, for example, referential transparency 
gives 

v:=£;.v.h;{T} = \/:= E.v.h; {v=E.\/.h} . 

It is also possible to move program fragments in and out of local scopes provided 
variable bindings are respected. Since empty scopes are equivalent to skip, i.e. 

skip = ||[vis v':V]|| = \\[hidW:n]\\, (8) 

^•^Implicitly local variables are assumed to be initialised by a uniform choice over their finite 
state space. In our examples however, we always initialize local variables explicitly, to avoid 
confusion. 



24 



it is possible to introduce fresh variables of any constant type. We may also 
introduce assignments to scope-terminated variables as long as they do not 
reveal information about the hidden state: 

||[visv':V; • • • ]|| = ||[visv':V; ••• ; v':e i?.vV ]|| , (9) 
||[hid h'lH; ••• ]|| = ||[ hid h':-H; ••• ;h':eD.v.h.h' ]|| . (10) 

As an example of the interaction of local scopes and visibility we have 
I reveal D.v.h ] 

= ('^v':X ' |{Z?.v.h.w'}] ) "represent revelation using assertions (|6|" 

= ( X^t^':-^ • I ll[ vis v': A';{i:).v.h.u'};v':=t;' ]|| ] ) "introduce fresh variable 

terminated by a secure assignment" 
= I ||[ vis v': X; v':G 13. v. h ]|| ] , "shift scope and represent 

visible assignment using assertions ([tJi" 

i.e. that a revelation is effectively an assignment to a temporary visible variable: 
because of perfect recall, the revealed value is not forgotten; but because the 
temporary variable is declared within a block, it is effectively erased. 

9.2 Assertions 

We present here some basic properties of assertions that will be used to justify 
algebraic laws for more complex statements such as revelations and probabilistic 
choices. First, we have that assertions satisfy the following equivalence, 

{Pi-v.h};{p2-v.h} ^ {pi.v.h X p2-v.h} = {p2-v.h}; {pi.v.h} (11) 

and are thus commutative under sequential composition. Constant assertions 
also commute over arbitrary programs, so that 

{p};S = S;{p}. (12) 

Since assertions referring to h may condition the hidden state, from the definition 



of secure refinement (Def. 7.3 ) we have 



{j:n. I{p„.v.h}l ) C I{ • Pn-v.h }1 , (13) 

for ( J2''^ ' Pn-V-h ) < 1- Using this we can calculate that skip p. v. h® skip C 
skip since from implicit flow the Ihs reveals p.v.h but the rhs reveals nothing. 
On the other hand, additions of assertions that refer only to the visible state 



reveal nothing, and thus ( 13 ) can be strengthened to equality, giving 



(E"-IK.V}1) = I{E"-Pn-V}l, (14) 

whence skip p.y© skip — skip. 

Using this algebra of assertions for Booleans Gji ^i.v.h we have 

{Gi.v.h};{G2.v.h} = {Gi.v.h A G2.v.h}, (15) 
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and so the Boolean assertions are idempotent, that is {G.v.h}; {G.v.h} = {G.v.h}, 
and complements under composition so that {G.v.h}; {^G.v.h} = abort. When 
all Boolean G„.v.h are disjoint we also have 

( ^n: [l..iV] • {G„.v.h} ) C {V": [1-^^] • G„.v.h } , (16) 
( [1-^^] • {Gn.v} ) - {V": [1-^] • G„.v } . (17) 

9.3 Basic laws for revelations 

A single reveal releases information but changes no variable. Using refinement 
we can with reveal Di IZ reveal D2 express that revealing D2 leaks no more 
information than revealing Di would have. The refinement between programs 
means this statement applies for any incoming distribution. 

We write reveal {Ei,E2) for the release of two pieces of information, one 
defined by expression Ei and the other defined by expression £^2- For example 
reveal (h mod 2, h mod 3) releases information about both h's divisibility by 2 
and 3: this is more informative than releasing just one, giving the refinement 

reveal (h mod 2, h mod 3) C reveal h mod 2 . (18) 

As we shall see, this and a number of other laws can be derived from a single 
general refinement rule which effectively states that any released information 
can be concealed somewhat by distributing it stochastically. 

Lemma 9.1 Basic reveal refinement Let D.v.U be a distribution over some 
X and F he a, stochastic matrix (which can depend on v) giving for each element 
of A" a distribution over some other type y. Then we have 

reveal D.v.h C reveal 13. v. h (g) F.y , 

where (®) is defined by (£).v.h ® F.\/).y :—{ ^x: X • D.v.h.x x F.si.x.y ).p^ 
Proof: We reason as follows: 

[reveal ZJ.v.h] 

= {^^x:X • IjCv.h.a:}] ) "define revelation using assertions ([6|" 

= {Y,x:X • |{L».v.h.a;};{( Y^V-V ' P-'^-^-y )}1 ) "distribution J'.v is full 



{1} = skip is unit of composition (33l" 



( X]2^jy:'^i3^ • I{'D-V-h.x};{F.v.a;.y}] ) "(fT4|; Kleisli composition 



distributes over addition" 



= {Y.x,y:X,y •l{D.y.\^.xxF.^.x.y}}) "(11 
E {Y.y-y '{{{Y.^-^ ' D.M.h.xxF.y.x.y )}\) "p 
= [reveal Z3.v.h ® F.mI . "define revelation using assertions^ 



If D.v.h and F.v are expressed as matrices then (cgi) is matrix multiplication. 
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□ 

As an example of Lem. 9.1 we suppose h is Boolean, and that we have a 



revelation behaving as follows. If h is T then T is emitted with probability 1/4 
and F with probability 3/4; if h is F then F is emitted unconditionally. We write 
this reveal D.h (omitting the .v in this simple case) via the Z3-matrix 

TP <~ emitted value 

h=T / 1/4 3/4 \ (19) 



h=F V 1 

Now we can condition on the emitted value, so defining a partition on any 
incoming state: for example if the incoming state s is (w, -§^7® 2 , F®2 ^) then 
[reveal D.hj.s ^ Uv, i'^lY'^K (v, l'^®^ expressing the fact that 

T is emitted only if h is T thus completely revealing h in this case; however this 
happens only 1/8 of the time; the remaining 7/8 of the time h is only partly 
revealed, with the a posteriori distribution's being merely F-skewed. 

Now suppose that the process is overlaid by another process F (again omit- 
ting .v) which obscures the information emitted by reveal D.h by changing the 
values stochastically: 

T F <— new emitted value 

emission from D.h was T / 1 ^ \ ^^^^ 
emission from D.h was F \ 1/2 1/2 J 

Overall, the value actually emitted by the combination is determined by the 



product of the matrices in (19 1 and (20), that is 



1/4 3/4 \ / 1 ^ \ - f 5/8 3/8 
1 J ""{1/2 1/2 J " 1^ 1/2 1/2 

which for the chosen incoming distribution gives that [reveal D.h (E) Fj.s is 
Uv, {{T® t , F®l I)® w , [v, ST®? , F® f |)®ra J, leaking less than | reveal D.hj.s. 

Now Lem. |9.1| justifies ([T8| with F as the projection function onto the first 
component. Other rules can be derived similarly: 

Lemma 9.2 Simple reveal rules 



reveal k 




skip 


(21) 


reveal 13. v. h 


C 


skip 


(22) 


reveal h 


C 


reveal D.v.h 


(23) 


reveal G.v.h 




reveal ^G.v.h 


(24) 


reveal (_Ei.v.h, _E2-v.h) 


c 


reveal E'l.v.h 


(25) 


reveal (iJ.v.h, iJ.v.h) 




reveal E.v.h 


(26) 



Proof: The first is a consequence of the equivalent definition of revelations in 
terms of assertions and the rest are consequences of it and Lem.[9?T| For example 
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(22) follows by defining F.v to be constant; and (231 follows by defining F.v in 



Lem. 9.1 to be D.v.h; (24) follows by defining F.v to swap the values T and F. 
Finally ( 25|26 1 follow by defining F.v to be the projection function. □ 



With the apparatus so far, the example in ^J2] could be sketchedp^ 



v:=h^2; v:=v-^2 

||[ vis v'; v':= h-i-2; v:= v'-^2 ]|| "classical reasoning with visibles and scopes" 

v':=h^2;v:=(hH-2)H-2 ]|| 

v':==h^2 ]||;v:=h-=-4 
^2: v:=h^4 



III VIS V; 
l|[ vis v'; 
reveal h 

v:==h^4 



"referential transparency [9.1 



"shrink scope; arithmetic" 



"revelation equivalence [9.1 



"(221, that reveal h-^2 C skip" 



9.4 Reveals in sequence 

When two or more ffiWM's are executed sequentially, where the outputs from 
one are "fed into" another, an observer is able to preserve information from 
earlier executions to add to information learned by observing later executions. 
The basic rule expressing the total amount of information leaked is set out next. 

Lemma 9.3 Sequential reveals Let Z?i.v.h and Z?2-v.h be distributions over 
some X and y respectively. Then we have 

reveal Di. v. h; reveal £'2.v.h = reveal (_DiX_D2).v.h , 

where xl?2).v.h is the joint distribution over ordered pairs of A" x 3^, defined 

as usual so that {DixD2)-\/-h.{x,y) is Di.v.h.x x D2-v.h.y. 

Proof: This follows directly from the definition of reveal D.v.h and sequential 

composition: 

I reveal _Di. v. h; reveal 1)2. v. h ] 

= ( ^x,y:X,y- l{Di.\/M.x};{D2-v.h.y}J ) "revelations as summations (lej) 

composition distributes additiorr 

= { Y.^,y-X,y ■ {{Di.y.U.xx D2.y.h.y}l) "([nl" 
= I reveal (Di xZ?2)-V.h ] . "represent assertion summation as revelationTo l" 

□ 

This rule says that we can simplify two successive reveals into a single reveal 
where the external values are gathered together and the residual probabilities 
aggregated as expected, so that overall the result is as though a single HMM 
had been executed, albeit with a modified stochastic matrix. Using this basic 
rule we can prove the following: 

^^With only a selection of laws, sometimes we must omit details in the calculations. 



28 



Lemma 9.4 Simple sequential rules 

reveal _Di. v. h; reveal v h 
reveal v. h; reveal i?2-v.h 
reveal h; reveal D.v.h 



reveal -V-h; reveal i)i.v.h(27) 
reveal (i;i.v.h,i;2.v.h) (28) 
reveal h (29) 



Proof: Using Lem. 9.3 equation (27 1 follows from the underlying commuta- 



tivity, and (28) follows from the fact that reveal E.v.h equals reveal ^E.v.h^. 



For (29 1 we have from (22 1 and (33 1 that reveal h; reveal £).v.h C reveal h. 



For refinement in the other direction we reason 

reveal h 
= reveal (h, h) 
= reveal h ; reveal h 
C reveal h; reveal D.v.h . 



□ 



The rules in Lem. 19^ formalise our intuition about successive reveals. For 



example (27 1 says that that information can be revealed in any order, that 



revealing two different expressions in succession is the same as revealing a pair 



containing both expressions (28), and that once h has been revealed entirely 



then there is nothing more to reveal ( 29 ) 



The following lemma lists further properties explaining how assertions and 
revelations interact via sequential composition. 



Lemma 9.5 Assertions and revelations in sequence 



{p. v.h}; reveal D.v.h 
{G.v.h} 



reveal D.v.h; {p.v.h} 
{G.v.h}; reveal G.v.h 



(30) 
(31) 



Proof: The first equivalence is shown using a similar proof to that of Lem. |9.3 



For (31 ) we show 



[{G.v.h}] 
= [{G.v.h}] + I{F}] 
= [{G.v.h}; {G.v.h}] 



"abort is zero of program addition" 



[{G.v.h}; {^G.v.h}] "separate Boolean assertions (15l" 



[{G.v.h}; reveal G.v.h] 



"composition over addition; |6r' 



□ 



The first ( 30 ) states that revelations and assertions commute, while the second 



(31 ) says that after asserting predicate G.v.h, no more information can be leaked 



by revealing the value of G.v.h. 
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9.5 Reveals in choice 

In a probabilistic choice between two reveal statements, an observer may witness 
both which revelation was executed as well as the outcome of that statement. 
We can combine such a choice into a single reveal statement. 

Lemma 9.6 Choices between reveals Let ZJ^.v.h and Dn.v.h be distributions 
over X and p.v.h be a probability; let 

A'a ~ Lft X + Rgt X 

be the discriminated union of two copies of X, with injection functions therefore 
of type Lft, Rgt: X^X2. We have that 

reveal D^.v.h _ reveal map.Lft.(Z?L.v.h) 

p.v.hffi reveal Dr.v.U p.v.h® map. Rgt. (D^. v. h) , 

where the injection-functions' "tagging" of the two distributions has effectively 
given them disjoint supports. 

Proof: Let D^.v.h, D^.v.h be respectively map. Lft. (-D^. v. h), map.Rgt.(-Dij.v.h). 

We have then 

I reveal D^.v.h p.v.h© reveal D/^.v.h 



reveal D'r .v.h „ „ h® reveal D'n.v.U 1 "Lem. 



9.1 



{p.v.h}; reveal D^.v.h ] "probabilistic choice" 

{1— p.v.h}; reveal D^.v.h ] 



^ "revelations are additions of assertions (|6|; 

|{p.v.h};{i:)^.v.h.(Lft.a;)}l additive distributivity 

+ |{l-p.v.h}; {D^.v.h.(Rgt.a;)}] of Kleisli composition" 



= iEx2:X2-l{{D'^.\/.hp,^M®D'j^.\/.h).X2}j) "(n]); andZ);,.v.h.(Rgt.a;), 

_D^v.h.(Lft.x) both zero" 

= H reveal D'r .V.h p.v.h® D'j^.v.h J . "addition of assertions as revelation (|6jl" 

□ 

From this lemma we may derive the following laws concerning revelations 
and probabilistic choice. 

Lemma 9.7 Simple choice rules For probability p.v.h and distributions Z^i. v.h 
and £'2 .v.h we have that 

(reveal 1?^. v.h p v.h© reveal D/j. v.h) C reveal (Z?^. v.h p.v.h© -D_r. v.h) 

However if the support of D^.v.h and D/j.v.h are disjoint, then the refinement 
relation is an equality. 

Proof: This follows from Lems. |9.6|9.1| □ 
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From (21) and Lem. 9.7 we have, for example, that 

skip p.v.h® skip — (reveal T p.v.h© reveal F) = reveal (T p.v.h® F) 
thus illustrating the information leakage due to implicit flow. 

9.6 Composition, probabilistic choice and conditionals 



As well as being monotonic in both their program arguments (Thm. 8.1), se- 
quential composition and probabilistic choice -of which conditional choice is a 
special case- satisfy the following basic laws corresponding to classical proba- 
bilistic equalities 

Lemma 9.8 Basic composition and choice laws For all programs S, T and 
R and probabilities p.v.h we have the following properites hold. 



S]{T-R) 




{S;T)-R 


(32) 


skip; S 




S; skip = S 


(33) 


abort; S 




S; abort = abort 


(34) 


{,S p.v.hffi T) 




(Ti_p.v.h®5) 


(35) 


{Sp.,M®T)-R 




{S; Rp.v.h® T; R) 


(36) 


{Si®T) 




S 


(37) 



Additionally, for any R satisfying both i?; {p.v.h} = {p.v.h}; R and R\ {1— p.v.h} = 
{1— p.v.h}; R we have that R distributes from the left into a choice with proba- 
bility p.v.h: 



R\ p.v.h® T) 



(i?; S p.v.h® R] T) 



(38) 
□ 



Since both assertions ( 11 ) and reveals ( 30 ) commute over assertions, equation 



( 38 ) gives us that they distribute to the right over arbitrary probabilistic (and 



conditional) choices. Additionally, commutativity of constant assertions over 



all statements (12) means that all programs distribute over choices in which 
the probability is constant. We can also derive, for example, the following 
properties: 



S p.v.h® S 

5* p.v® S 
if G then S else T 



S (39) 
S (40) 
reveal G; if G then S else T (41) 



The first two follow from left distributivity ( 33|36|2l" ) and Lem. 9.7 The last 
follows from ( 38|31 1. 
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9.7 Rules for general iteration 



Recall that we write while p.v.h do 5" od for a general iteration of S, with 
probability p.v.h of exiting the loop on each iteration. From Thm. |8.1| we have 
that such loops are monotonic on their program argument. Additionally, from 
its least-fixed-point semantics least fixed point we have 

Lemma 9.9 Fixed point rule If {S; W) p.v.h® skip = W then we have the 
refinement (while p.v.h do S od) C W. 

Proof: From the Tarski fixed-point theorem [46] wrt the order (<), and that 
the loop is a least fixed point, we have immediately 

(S"; VF) p.v.h© skip < W implies (while p.v.h do od) < W. (42) 

The result then follows immediately from the two inclusions {—) C (<) C (C). 
□ 

In a specification task, however, the goal is typically to implement a speci- 
fication by an iteration, i.e. to establish a refinement in the opposite direction. 
For terminating iterations we have this rule: 

Corollary 9.1 Termination iteration If while p.v.h do 5 od terminates with 
probability one, and {S; W) p.v.h© skip — W, then while p.v.h do S od — W. 
Proof: We adapt the proof of Lem. |9.9[ noting that if the loop terminates it 



is (<)-maximal and hence the rhs (<) in (42) must in fact be an equality. □ 



Termination is usually shown by exhibiting a probabilistic variant over the 
state [TH[M1[3S]; a straightforward simple case is when the loop's exit probability 
is bounded away from zero, in particular while k do • • • for any constant fc < 1. 



9.8 Small example: one guess at a password 

We have a hidden password p chosen from three possibilities V: = {pi,p2,P2}- 
This fragment describes an attacker's single guess, uniformly chosen: 

||[ vis g; g:e lpi,P2,P3}; reveal g=p ]|| . 

Local visible value g is chosen from the uniform distribution ^Pi,P2,P3^, and 
then it is used as a guess. Note that if the guess is correct, then T is revealed 
which -in itself- does not reveal the password's value: that latter is then learned 
by deduction, from the program's code and the fact that g is visible. If g had 
been hidden, we would know only that the guess had succeeded, but still not 
the value of p. 

We now show how algebra can be used to convert "operational" descriptions 
like the above into less obvious but more calculationally convenient forms, in 
this case a single reveal statement; and in j |TO| we will see how useful this 
equivalence turns out to be. For now, we reason 
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vis g; g:e ipi,P2,P3j; reveal g=p 
vis g; 

g-=Pi ®g-=P2 ® g:=P3; 
reveal g=p 



"split visible choice using ([t]); 
note choices (©) are uniform 
by convention, i.e. (1/3©) 
in this case" 



VIS g; 



"left distributivity (36l" 



= pi; reveal g=p; 
= P2; reveal g=p; 
= P3; reveal g=p 



vis g; "replace expressions by those of equal value 9.1 1; 

'Pi; reveal pi=p; e.g. in the first branch g:=pi establishes 

--P2] reveal P2=P; ^^^^ Pi=S, so that g can be 

-p^; reveal P3^p replaced by pi in the reveal" 



VIS g 

vis g 
vis g 



= Pi 

= P2 

= P3 



reveal pi=p 
reveal P2=P 
reveal P3=p 



"shift scope (S9.ll, since g is 
no longer free in the reveal's" 



reveal pi=p ffi reveal p2=P © reveal p3=p 
reveal {pi,pi=p) ® reveal {p2,P2=p) © reveal (p3,p3=p 
reveal (pi,Pi=p) © (p2,P2=p) © {P3,P3=p) , 



and 
'Lem. 
'Lem. 



(33|" 
9_1 
9.7 



giving a single reveal whose expression-part we manipulate further, at (461 
below. Note that it is the appeal to ([T]) that relies on g's being visible: if it were 
not, then the implicit flow introduced by the first step would represent a leak, 
invalidating the equality. 



10 Extended example: iterative reasoning 

We now demonstrate our treatment of iteration, reusing the simple password- 
guessing attack within a loop. 

10.1 A password attack: specification 

We assume a set of passwords V and a hidden variable p: V containing the 
(current) password; let Ft/P be the set of all size-iV subsets of V. A typical 
attack would be to choose one of those sets of potential passwords, and then to 
try them all in a "bulk attack" as in the program fragment 

||[ vis G; G:e [P^T'J; reveal {p}nG ]|| . (43) 
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(We omit the typing G: FV, to reduce clutter.) The statement G:e [Vj^V\ makes 
a uniform choice of size-A^ subset of V, assigning it to G. We are assuming that 
N is strictly less than the size P oi V. 

The reveal {p}nG reveals either {p}, if the attack succeeds, or the empty 
set if it does not. That is, the outcome of fragment ( [43| above is either to say 
"the hidden password is p" (a successful attack, revealing {p}) or "the hidden 
password is not in G" (an unsuccessful attack, revealing 0) since, in the latter 
case we do know the visible attack-set G even though the attack failed. As a 
specification, it abstracts from precisely how the passwords are tried, in what 
order, or whether possibly repeated: it says only thay they are tried. 

Now suppose the incoming distribution of p is some tt: DP; then the program 
fragment above produces an output hyper H-.'D'^V comprising a distribution of 
distributions over V. (Note that the output hyper contains no G component, 
because G is local.) If we calculated this with our semantics (although we omit 
the calculations here), we would find two kinds of inners in its support, namely 

success A p-indexed family of point inner distributions each itself with 
outer probability N{tt.p)/P, the probability n.p that p was the password, 
but multiplied by the probability N/P that it was in the uniformly chosen 
attack-set G of size N. 

failure A G-indcxed set of inner distributions of support-size P~N, each such 
distribution derived by conditioning tt on not being in the set G and having 
outer probability (1— 7r.G)/C^, the probability that this particular G was 
chosen for the attack-set multiplied by the probability that the password 
was not in it. 

As a check, we note that the outer probabilities sum to one, as they should since 
the specification program is terminating: we have 

Ep7V(7r.p)/P + EG(l-^-G)/C^ 
EpN{7r.p)/P + EgI/C^ - Eg^-G/C^ 

= 1 . 

Finally, if for example we assume that the incoming distribution tt is uni- 
form over V, then the Bayes Risk before the attack is 1 — 1/P and, after 
the attack, it has been reduced to the conditional Risk PxN{tt.p)/ PxO + 
C^x((l-iV/P)/C^)x(l - l/(P-iV)), that is reduced to 1 - {N+l)/P. 

10.2 A password attack: implementation 

We suppose a simple-minded actual attacker who chooses single passwords uni- 
formly at random, possibly with repetition and, after each attack, has some 
fixed probability c of giving up. This would be described by the fragment 

while c do 

||[ vis g; g:eP; reveal g=p ]|| (44) 
od . 
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A complete analysis of (44) is combinatorially complex, having an output 
hyper comprising inner distributions over subsets of V of all possible sizes and 
-as such- would be difficult to reason about within a larger system. More prac- 
tical would be to determine, once and for all, whether (44) is an implementation 
of ( 43 ) , that is is at least as secure as ( 43 ) and then ever after to use the simpler 
(43) in larger analyses. Since (44 1 is parametrised by c, we might in fact ask 



What is the largest value of probability c for which (43) C (44)7 



10.3 Example refinement analysis: the simplest case 

To illustrate the approach, we address the above question in the very simple 
case where 'P={pi,P2,P3} is of size 3, and our specification describes a "bulk 
attack" of size A^=l.[^ Thus we are asking for the largest c that achieves the 
refinement 

||[visg; g:£V; reveal g=p ]|| C while c do (45) 

||[ vis g; g:eV; reveal g=p ]|| 
od . 



We do this in two stages: the first is to hypothesise a parametrised straight-line 
equivalent for the loop, then synthesising a condition on the parameters that 
makes it satisf y th e fixed-point equation of Cor. |9.1| 
As in Lem. 



9.6 



we introduce a discriminated union V' := is P -|- isn't "P -I- nix 
which, used in reveal commands, will allow us to reveal what p is, what it is 
not, and -for algebraic convenience- to reveal nothing at all. 

In our simple case here of V having just three elements, therefore V' has 
seven. Further exploiting P's size of three, for any p in 7^ we write p+ for one of 
the values p is not, and p_ for the other. With this approach we can express Ihs 



(45) without its local block and the guess variable g: for that, we return to our 



example calculation of {9.8 giving reveal (pi,pi=p) © {p2,P2=p) ® {P3,P3=p)- 



We can recode this directly using Lem. |9.1| it becomes just 

reveal -^^is p, isn't p+, isn't . (46) 

We return to the synthesis of the loop's straight-line equivalent, supposing 
it has the form 

reveal ^is p®^, isn't p+®3, isn't p_®*, nix®^J (47) 

for some probabilities x+y+z = 1 that we have to determine. This reveals what 
p is with probability x, what p is not with probability y/2+y/2 = y; and with 
probability z it reveals nothing at all. 



^^In this simple case a bulk attack of size N=2 is uninteresting, because it would reveal 
everything: either what the password is (if pSG) or two values that it is not (if p^G). In the 
latter case we would deduce p's value anyway, by elimination. 
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Our synthesising equality is then given by Cor. 9.1 because the loop with 
its constant c terminates; that is we require 



reveal §is p®^, isn't p+®^ , isn't p_®l 

reveal ^is p, isn't p+, isn't p-^; 
reveal ^is p®^, isn't p+®2. isn't p_ 
B skip , 



nix 



(47) 



loop body 



(47) 



loop exit 



whose right-hand side we can simplify with the revelation laws from f|9] in 
particular Lems. |9.1|9.3pj9Jl That gives 



reveal {{ is p @ c{x + 2y/3 + z/3) 

isn't p+ @c(y/6 + z/3) 

isn't p_ @ c{y/6 + z/3) 

nix @ 1-c 



and that should be equal to the left-hand side, the original (47 1. Since z — 1— c 



trivially, we concentrate p+ case to obtain y/2 = c{y/6 + 2(1— c)/3) , so that 
y — 2c(l— c)/(3— c), whence x = c—y ~ c(l+c)/(3— c).F^ 



10.4 Establishing the c-optimal refinement: 
the second stage 

We now want to find the largest value of c that allows 

||[visg; gteT'; reveal g=p ]|| 



reveal \\s p®^, isn't p+®2, isn't p_ 



nix 



have the c-determined values calculated above: we recall the remark 



where 

above at (|46|) about formulating our specification as a simple revelation, without 



needing a local variable g. That gives the equivalent goal 

reveal p®^, isn't p+®3^ isn't p-^sj' 
\— reveal \\s p , isn t p+ ^ ^ isn t P- ^ ^ nix 



^^Working through this and extracting the arithmetic results in the foUowing table: 
4, effective joint revelation 

is p, is p with prob. x/S equivalent to revealing just is p 

is p, isn't p+ with prob. j;/6 equivalent to revealing just is p 

is p, isn't p_ with prob. j;/6 equivalent to revealing just is p 

is p, nix with prob. z/3 equivalent to revealing just is p 

isn't p+, is p with prob. x/3 equivalent to revealing just is p 

isn't p+, isn't p-i- with prob. j;/6 equivalent to revealing just isn't p+ 

isn't p+, isn't p_ with prob. j//6 equivalent to revealing just is p 

isn't p-(-, nix with prob. z/S equivalent to revealing just isn't p-|- 

isn't p_, is p with prob. x/3 equivalent to revealing just is p 

isn't p— , isn't p+ with prob. y/6 equivalent to revealing just is p 

isn't p_, isn't p_ with prob. j//6 equivalent to revealing just isn't p_ 

isn't p_, nix with prob. z/3 equivalent to revealing just isn't p_ 
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For this we refer to Lem. 9.1 whose D is effectively the Ihs above: written 
as a matrix, it would be 





is Pi 


isn't pi 


is P2 


isn't p2 


is P3 


isn't p3 


nix 


Pi 


1/3 








1/3 





1/3 





P2 





1/3 


1/3 








1/3 





Pi, 





1/3 





1/3 


1/3 





. 



We need a 7x7 stochastic matrix _F, that is a function V' 
multiplied after D, gives the rhs above, that is 



(48) 



■ DP which, when 





is pi 


isn't pi 


is P2 


isn't p2 


is P3 


isn't p3 


nix 


Pi 


X 








2//2 





y/2 


z 


P2 





y/2 


X 








y/2 


z 


P3 





y/2 





2//2 


X 





z 



The columns of the latter must be interpolations of columns of the former, 
thus the first rhs column [2:,0, 0] cannot contain non-zero contributions from 
any other than the first Ihs column [1/3, 0, 0].[^ Hence x<\Ji and, since we are 
trying to maximise c we maximise x also by setting x:^ 1/3. Similar reasoning 
then establishes that the second rhs column [0,y/2,?//2] must be obtained by 
taking proportion Zyjl of the second Ihs column; and then the last rhs column 
is made by combining proportions X — Zyjl of each of columns 1,3,5 on the Ihs. 

Since x=l/3 entails c(l+c)/(3— c)=l/3, that is c « 0.53, we have estab- 



lished our desired (45 1 with c taking that value (or less), independently of the 



distribution with which the hidden p might have been chosen. 



11 Related work 

11.1 HMM''s, algebra and noninterference 

Hidden Markov Models |22j have a long history and many practical applications; 
their conceptual connection to noninterference suggests that their algorithmic 
methods might be of use here. That is, extant HMM techniques could be used 
for efficient numerical calculation of whether some Tg, Es-, a specification, was 
secure enough for our purposes: once that was done, the refinement relation 
established via program- algebra could ensure that an implementation Ti,Ej 
was at least as secure as that without requiring a second numerical calculation. 
The advantage of this is that the first calculation, over a smaller and more 
abstract system, is likely to be much simpler than the second would have been. 

There are techniques based on the manipulation of "graphical models" to 
represent Bayesian networks in alternate equivalent ways: these are similar in 

The probabilities in the text come from adding the final column in groups. 

^*We write transposed columns horizontally as rows between brackets [•] instead of paren- 
theses. 

-"^^The function 2::= c(l+c)/(3— c) is monotonic for 0<c<l. 
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spirit to our algebraic manipulations [7] , although there the motivation is usually 
to find more efficient algorithms. 

The application of ifMM's to noninterference security is recent: originally, 
noninterference was qualitative [TSj . Probabilistic noninterference [441 lllj is a 
generalisation of that idea to provide weaker statements concerning an attacker's 
ability to guess high security state by observing the behaviour and pattern of 
observables. Variations of the idea have been studied extensively for concurrent 
systems [43l |48] and taking computational issues into account |6j . 

The definition of our space ©(VkDH) and its refinement order draws inspi- 
ration from constructions and techniques already present in the literature. The 
monad is Giry/Kanotorovich [13 HI], and the refinement order is related to the 
theory of inhomogeneous Markov Chains [T2] . 

11.2 Compositionality, information theory 
and assorted entropies 

A compelling approach to quantitative security is to use information-theoretic 
measures to compare the (e.g. Shannon) entropy of the hidden variables' a priori 
distribution (e.g. their incoming values) and their a posteriori distribution once 
the program has executed [TUl El [3] ; recently this has been applied to iterating 
programs as well [551130]. But compositionality is crucial: given that one pro- 
gram is more secure than another according to some entropy-based criterion, 
how do we know that inequality is preserved in a larger context? 

We have shown earlier |27j refinement has two key properties for composi- 
tional entropy-based reasoning: it is preserved by contexts; and it implies non- 
decrease for an assortment of entropies, including Shannon Entropy, Guessing 
Entropy, Bayes Risk and Marginal Guesswork. Perhaps it applies to others [S]- 

Thus our work here is part of a larger program to unite earlier work in quan- 
titative information fiow (or escape) [S] [24] in channels, as models of computa- 
tion, with a denotational presentation of program semantics based on HMMs 
including a compositional refinement relation that compares these quantitative 
measures between programs, specifically between specifications and their pur- 
ported implementations. By considering iterations, we are extending our own 
earlier work [27] in a way that relates to others' work on quantitative information 
flow from iterations [26l |4Q] much as in the way described above. 

Compositionality "within" a program addresses the question of whether se- 
curity established for a component is preserved when embedded in a larger 
context [8j. Compositionality "between" programs, as we do here, addresses 
the question of whether two programs' relative security is preserved when they 
are both placed in the same context: this latter is less common. 

A representative example of others' doing so is recent work by Yasuoka and 
Terauchi [5T] in which computational hardness is analysed. They consider deter- 
ministic sequential straight-line programs, i.e. without probabilistic or demonic 
choice and without loops, but that nevertheless operate in a quantitative con- 
text (i.e. having input distributions rather than simply input values). Since the 
programs have no probabilistic choices, those authors are able to reduce the 
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(analogue of) the secure-refinement relation to the qual itative noninterference 
comparison of the programs. This is a special case of our general conjecture 
concerning the promotion of qualitative results to quantitative results provided 
demonic choice is replaced by uniform choice }27[ Sec. 8.1]: if there is no demonic 
choice, there is nothing to replace and so the program is unchanged. 

These authors look for a relation guaranteeing the correct entropy ordering 
(for all incoming distributions) wrt a selection of entropies, as we do, and they 
address the computational hardness of validating that relationship in particular 
cases. We address with compositional closure the additional question of how 
weak such a relation can be p7] . 

12 Summary, conclusions, prospects 

Earlier we built the core of a programming algebra for probability and noninter- 
ference: here we have extended it to include iteration and nontermination; and 
we solved the technical problem of incompleteness, that arose in the process, 
by introducing a simpler "termination" order that allowed us to remain with 
discrete distributions. Further, we have shown how the semantics is related 
to HMM^s, an existing consensus of how such application domain should be 
handled and analysed. 

The formalist rigour of program semantics, however, can make unusual de- 
mands on traditional mathematical presentations: a programming language is 
interpreted inductively in a structured space equipped with operators corre- 
sponding to the constructors of that language. In particular, sequential pro- 
grams with any kind of nondeterminism (whether demonic, probabilistic or 
some other) are often interpreted as functions of type SS^KSS where SS is 
the state space and IK is some type constructor (or functor) expressing the non- 
determinism. Thus our first contribution in detail was to (re-)interpret HMM^s 
in this style (in fjs]), where SS was VxBH and K became D (in p^I] ). We made 
some small programming-motivated extensions to the HMM model, in partic- 
ular adding visible variables to the state so that the most recent observation 
is carried forward into the next operation. The second extension was allowing 
iterations and hence, potentially, computations that might not terminate (thus 
D rather than D). 

In constructing the semantic operations we built-in perfect recall and implicit 
flow, which are security assumptions about the power of the attacker. This can 
be controversial: in general one can choose to impose these or not. We did 
impose them because we have argued extensively elsewhere [3S1 [37] that a com- 
positional definition of program refinement is not possible otherwise.]^ Perfect 
recall in particular, however, does seem a good fit for HMM's independently 
of the the refinement argument, since the knowledge gained from observations, 
once emitted from the output-side of an HMM, cannot be expunged from the 
attacker's repertoire by any kind of overwriting subsequently. 

■^''We did not have space to repeat those arguments here. 
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Our second contribution was to work-around (C)-incompleteness by using 
an alternative, more specialised order (<), showing that a program algebra 
including iteration is feasible (^; and our third contribution was to argue by 
example that the resulting source- level reasoning is promising (j ]10[ ). 

There are two immediate prospects for further work. One that in practice 
we would like to answer questions like the one posed in j ]10.2| for general guesses 
of size N and large password spaces P, and many other similar. For this we 
would need tool support both for the semantics (i.e. given a program, determine 
its meaning) and for establishing refinement (i.e. whether this meaning refined 
by that one) in a probabilistic setting [531 130] • 

The other prospect is to complete our semantic space to proper measures, 
in fact to follow the approach outlined in ^6.2 Beyond compositionality of (C) 
we want its compositional closure, already achieved for straight-line programs, 
guaranteeing that the refinement relation is not unnecessarily strong; but that 
argument required (analytical) closure/compactness of a set of finite, discrete 
probability distributions in a metric space |27j : and to do that here, with the 
extra feature of iterations that generate chains of approximants, seems to make 
the move to measures inevitable. 

Finally, our longer-term aim is to add demonic choice to the model for e.g. 
demonic scheduling that takes into account what the adversary can, and cannot 
see [3] . We have done this for qualitative systems [5S1 [57] and we have earlier 
combined demonic- and probabilistic choice without hiding [201 [Ml [2H]- The 
technique of convex closure, useful for that, generates uncountably many inter- 
polated distributions: it is a second reason we are likely to need measures, and 
so we hope to exploit the structures developed for this paper at that later point. 
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These appendices contain preliminary, background material intended to sup- 
port further extension of the main results, above, in a separate, subsequent pub- 
lication; they were written after the main report and are not strictly part of it. 



A The Kantorovich metric 

and its related probability monad 

We begin by recalling the notation and structures we have been using for the 
discrete case. Let D be a finite set, of size some N; then D£) is the set of 
discrete distributions over it. EH The Manhattan metric between two of those 
distributions 2} is then given by m.5i.(52:= X](il^i-'^~^2-rf| iord:D. Scaled by 
1/iV, the metric becomes 1-bounded; in any case, the usual Euclidean topology 
is induced, on M.^ effectively, and the space is compact. 

Now more generally, given a metric space its Borel algebra is the smallest 
sigma-algebra containing the open sets; and the combination of such a space 
and the Borel algebra is a measurable space. The space (DZ),m) of discrete 
distributions, above, is such a metric space; and we can therefore define measure 
spaces (DD, B{DD, m), /x) over DD, where B has constructed the required Borel 
algebra: these ^ are distributions of distributions over D, what we have been 
calling hypers.^^ The special case of discrete hypers could be said to be the set 

Because (D£>,m) is compact and its metric 1-bounded (once scaled), the 
Kantorovich metric construction can be used to "lift" the underlying metric m 
to a new metric on the measures fj, themselves, i.e. giving a distance between 
any /i{i,2} [ISI — and that lifted metric is again 1-bounded and makes the 
space of measures compact. This gives a new 1-bounded and compact metric 
space of measures over distributions over D, which we write MOD: it depends 
implicitly on the underlying metric m on DD. And because 1-boundedness and 
compactness is re-established, the process can be repeated, going on to form 
e.g. WPDD etc. 

In fact if D is itself given the discrete metric d.di.d2:=l, then the Kan- 
torovich construction gives (up to scaling) the Manhattan metric on DD that 
we have already chosen, and so for MD£) we could just as well write M^£); but 
as a mnemonic aid we continue to use D where it applies. 

■^^We use bold D for the underlying set, rather than e.g. a calligraphic X as in the main 
report, for notational reasons explained below. 

^■^Note that '5{i,2} were measures, actually discrete distributions over D, whereas is a 
measure over D£J. 
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A.l Notation and conventions for measures 
and the Kantorovich functor 

In the presentation further below we will be using many (at least six) different 
but related measurable spaces, based on metric spaces as above, with each one 
introducing at least seven derived variables: the space itself; the underlying set; 
the metric; the induced Borel-algebra; the measurable sets in that algebra; a 
measure (or measures) defined on the algebra; and finally a variable ranging 
over the underlying set itself. To achieve some (local) naming consistency, we 
follow these conventions systematically: 

• The underlying set will be named in bold upper-case Roman, thus A, and 
its elements will be lower-case Roman so that for example aCzA. 

• The associated metric will be in bold lower-case Roman, thus a e A^^-M. 

• The metric space as a whole will be in underlined upper-case Roman, thus 
A= [A, a). 

• The Borel-algebra induced by the metric a on A will be the in correspond- 
ing Roman calligraphic, thus A'^VA. 

• The measurable sets in A will be in upper-case Roman, so that aCzACzA. 



• Thus the measurable space derived from {A, a) will be {A,A). 



• Measures over {A, A) will be in lower-case Greek, so that a.AEM. and then 
(A, A, a) is the measurable space written in full. Abusing the types, we 
sometimes write A for the set of those measures, thus by a€A meaning 
that (A, A, a) is a measurable space. 

• The Kantorovich construction taking metric space A to the metric space of 
measures over it will be written M^. By a similar abuse we write /3gMA 
to mean that /3 is one of those measures. 

• Given a function /: A— >-B we write M/ for the corresponding function 
between (the measures in) MA and MB with the usual definition 
Mf.a.B:=a.{f-\B). 

We will base our calculations below on van Breugel's presentation of the Kan- 
torovich monad, whose relevant results we now summarise (but in the notation 
we have established above); the page references are to van Breugel's publication 
|47j . In fact we will be using subprobability measures to take nontcrmination 
into account 47, Sec. 5. 3], and will assume that throughout the following. 

Lemma A.l Facts concerning Kantorovich subprobability-measure monads and 
metrics 
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pl3 The unit and multiplication of the monad on M are defined as for the 
Giry monad, with an argument necessary that the Giry-style definition of 
multiplication is meaningful for Kantorovich |50j . The unit rj is the "make 
a point measure" function pnt and the multiplication fj, is the "average" 
function avg. 

pl3 If metric space X is compact, then MX is compact as well. 

pl3 If metric space 2L is complete, then MX is complete as well if we restrict 
ourselves to tight measures. 

pl3 If a function / is nonexpansive, then so is M/. 

pl3 Both the unit ij and the multiplication /x of the M-monad are nonex- 
pansive. 

pl4 (M,?7,/i) is a monad on the category of 1-bounded compact metric 
spaces with nonexpansive functions between them. 

□ 

B Antisymmetry of refinement 
B.l Refinement of measures 

We begin by revisiting our definition of refinement, placing it in the measure 
context. 

Definition B.l Entropy refinement of measure-hypers Refinement is a rela- 
tion on measures A in MDX, i.e. hypers, that -informally- merges elements of 
DX together based on the weights assigned to them by A.p^ 
More precisely we have the following: 

1. Start with a finite set X with the discrete metric (such as our program 
state-space) . 

2. From ([T]) construct DX^MX with the Kantorovich/Manhattan metric. 
These are our discrete distributions. 

3. From ^ construct MDX = M^X with the Kantorovich metric. These 
are our hypers. 

4. From ^ construct M^DX = M^X with the Kantorovich metric. These 
are our supers. 

5. From Q construct M^DX = M^'X with the Kantorovich metric, which 
is where the "mega" lives that is used in the conjecture supporting tran- 
sitivity. 

■^•^We retain the use of upper-case Greek letters for hypers, in spite of the conventions above, 
for consistency with the main paper. 
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6. For two hypers Aj^/j say that A5 is entropy refined by A/, written 
As^A/, just when there is some super A such that 

As = /x.A A (S/x).A = A/ . (49) 

□ 

B.2 Notation for integration over measures 

Because the coming calculations are intricate in places, we use a slightly non- 
standard notation for integration over measures in order to make the manipula- 
tion of bound variables etc. absolutely explicit Fix a measure space {A, A, a), 
with A the underlying space and A a sigma-algebra on it, and a a measure. We 
consider the expression "eayda" simply to be an alternative notation for the 
lambda expression (Aa • exp), i.e. with da binding free occurrences of a within 
exp to make a function over A. (Note this convention accords perfectly with the 
notation for Riemann integration, in particular that the "da" binds occurrences 
of a in the body.) Then exp da means 

Consider the expression exp to be a function of its free variable a, 
and integrate that function over the measure a, but restricted to the 
measurable set A in A. 

When / say actually is some function over A (rather than an expression con- 
taining a), then we write just f with no indication "d-" of bound variable, 

and it is of course equivalent to f.a da where exp is now the function appli- 
cation f.a. Finally, when there is no restricting set A we can leave it off. Thus 
the simplest "normally notated" integration J fdfi we would write instead as 

■^*This is not done lightly: variant notation always imposes a barrier between writer and 
reader. As justification in this case, we quote "Sometimes the integral of a function h with re- 
spect to a measure fi, usually written as f hd/i or f h{x)dfi{x), will be written as f h(x)ii(dx). 
This can make clearer what the variable of integration is. . . |14l p347]." In the last case, what 
seems to be meant is that f h{x)ii{dx,y) would be an integration of function h of x over a 
measure fi on that x with /i depending on some y — which we would write in our notation as 

As an example of how this can got out of hand, consider the expressions 
Ixi M;cid(7ri)*(/i)(xi) 
and f^^ (/-^^ f(xi,X2)ft{dx2\xi)^ ^i(n-'^ (dxi)) , 

taken from the product-space section of Wiki entry on the Disintegration Theorem [l] 
. . ./Disintegration theorem]. It's reminiscent of "Von Neumann's onion" 

so called because "it has to be peeled before it can be digested" 1171 The legend of John von 
Neumann] . 

■^^This avoids all the contortions one reads like dfi{x) and fi{dx) as ad-hoc variations on the 
"normal" use, varying from text to text depending on the complexity of their calculations. 
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B.3 Convex functions and Jensen's inequality 

We will use a convex function y on D£), i.e. one with the property that for 
0<p<l and i5{i^2}: 1D)I> we have y.(i5i p© 82) > ySi p® y.S2- It is strictly convex 
if the inequality is strict whenever 61^^62 and py^O, 1. We define strictly con- 
vex y-S:—J2di^--'')^' motivated by the "colour" construction for inhomogeneous 
Markov chains [IS]. 

From Jensen's inequality j31l Thm.2] we have for convex y and hyper A that 
/a y ^ y-(3vg.A). Defining Y.A:= y, we write this Y.A > y.(/x.A). 



B.4 Entropy refinement does not increase Y 



We consider for A{ g jy in MD£> the entropy refinement A^^A/, beginning with 
the first of the two criteria from Dcf. 5.1 that there be a super A in WP'DD 
with A 5 = avg.A. We calculate 



Y.As 

/avg.A y 

L (/Ay)dA 

/AY.AdA 
/aY. 



"[ISl Thm.l(d)]" 



From the second criterion of refinement, that map. avg.A = A/, we calculate 



Y.A, 
= Y. (map. avg.A) 

f nap. avg.A ^ 

= y o avg "[m Thm.l(a)]" 

= /a y-(avg.A) dA 

< Y.A dA "y is convex" 

= /aY. 

Putting the two calculations together gives us Y.A5 > Y.A/ immediately. 



B.5 Conditions for strict decrease of Y 

The step /^y.(avg.A) dA < J^Y.A dA depends only on the underlying in- 
equality that y. (avg.A) < Y.A for all A, and so we will have equality there just 
when the set of A's such that y. (avg.A) = Y.A has measure one in A. But 
A satisfies that equality precisely when it is of the form pnt.S, that is for 
some 6 in DX, i.e. is a point-hyper centred on that 6 j31[ Thm.5]. Thus we have 
equality iff the set P:= \S:DX • pnt.S^ of point hypers in MDA" has measure 
one in the super A.p^ 

■^^P is closed because it contains its limit points in tlie Kantorovicii metric, hence is mea- 
surable. 
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But if A.P=1 then avg.A = map.avg.A,[^ whence As=A/. That gives 
us that Ag^A/ imphes Y.As>Y.Aj unless A5=A/, whence antisymmetry of 
entropy refinement follows trivially. 



B.6 Antisymmetry of secure refinement 

This now follows easily from the above because A5CA/CA5 implies 
J2^s<J2^i<J2^s, whence J2^S=J2^i and so Ag^A/^As, thus finally 
As=A7 from p31 



C Transitivity of secure refinement 

We prove transitivity of refinement for three domains of increasing sophistica- 
tion, taking advantage of the similarities between them to make the structure of 
the proof clearer. Although direct matrix-based proofs are possible in the dis- 
crete case, our aim is to use monadic-style arguments that are easily generalised 
to measures. 



C.l A useful conjecture 

If we had the following property in our monad, transitivity of entropy refinement 
would be straightforward: 

Conjecture C.l Suppose we have a hyper A in D^A" and two supers A{i 2} 
in D^A" with the property that map.avg.Ai=A and A=avg.A2. Then there is a 
"mega" V in D'^X such that Ai=avg.V and map^.avg.V=A2. □ 

Fig. |4] further below gives a diagram of this relationship (but in more math- 



ematical notation). With Conj. C.l the proof of transitivity of (^) would be 



Lemma C.l Refinement is transitive If Ai^A2 and A2^A3 for hypers Aj-j^ 2,3} 
in D^A", then Ai^Ag. 

^^If A.P=1 then A=map.pnt. A for some A, whence 

avg.A = avg.(map.pnt.A) = (avg o map.pnt). A = A 
= map.(avg o pnt). A = map. avg. (map.pnt. A) = map. avg.A . 

Alternatively, for measurable set QCP of hypers we calculate directly 
A.Q 

= A. {A: P ' (pnt O avg). A £ Q} "QQP; pntoavg is identity on P" 

= A.{A:MA' • (pnto avg).A e Q} ■•^.p=i" 
= map. (pnt o avg). A.Q , 

whence because A.P=1 we can ignore the assumption QQP to conclude 

that A = map. (pntoavg). A, i.e. that A = map. pnt. Aj. Then As = avg.A = A/. 
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Proof: 
with 



From Def. Oand Conj. [cllwe have A{i 2} in D^A" and V in D^'A' 



Ai = avg.Ai 
A2 = avg.A2 
Ai = avg.V 



A 
A 
A 



map.avg.Ai = A2 
map.avg.A2 = A3 
map^. avg.V = A2 



Monad laws then give Ai — avg. (map. avg.V) and map.avg. (map. avg.V) = A3, 
so that Ai^A3 is established with witness map. avg.V. □ 



That leaves of course the proof of Conj. |C.1[ so there is still some work to 
do: we explore the conjecture in three stages. 



C.2 Proof of Conj. C.l for a qualitative model 



Here we attempt the proof of our conjecture in the qualitative model of non- 
interference and refinement |35J |MJ |3H], i.e. for sets instead of distributions. 
(Refinement is already known to be transitive for that model; we are redoing it 
in a different way in order to bolster our intuition for its generalisation.) 

For succinctness, we will from here on use more conventional monad-notation: 
the functor (map) is M and the multiply transformation (avg) is /i.. 

The first move it to make the conjecture slightly more general, hence in fact 
simpler: we assume we have Mf.X = B = fi.Y for some B, X, Y and / all of the 
right types. We will find Z such that jjb.Z—X and M?f.Z=Y. Using a general 
/, instead of /x as in Conj. |C.1| specifically, means we can think one level lower 
than before: we now have that B^X are just sets, and Y -and Z (effectively 
the V of Conj . [CT as we will see)- are simply sets of sets. 

With this setup, the construction of Z is very straightforward: it's the set 
of sets {y.Y • {x: X \ f.x^y}}. We calculate first 



H.Z 

= {z: Z; x': z • x'} 

= {z:{y:Y ' {x:X\lxey\Yx':z 

= {y:Y;x':{x:X\!.x^y) • x'} 

= {y:Y;x:X \ f.xGy • x} 

= {x:X\i3y:Y\f.xey )} 

= X . 

The other calculation is 



x'} 



{y 
{y 
{y 
{y 

Y 



Y • 

Y ' 

Y ' 

Y ' 



{x:X I f.xey ■ f.x}} 
{x: X; b \ b^f.x A bey 
{b:y\{ 3x:X\ b^f.x 

y} 



■b}} 
)}} 



"defn Z" 



.X = ti.Y" 



"defn Z" 
"one-point rule" 

"Mf.X = /x.y" 



That indeed proves Conj . [CTT] for the qualitative model if we instantiate / to /x. 
Based on it, we now turn to the discrete quantitative model, as in this report. 
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C.3 Proof of Conj. |C.1| for a quantitative model 

'.X = B = fi.Y with B,X distributions and 



Again we generalise, so that 
Y, Z hypers. 

This time the construction oi Z:= \y:Y • ^x:X \ y.{f .x)/ B.{f.x)^^, where 
we are able to exploit the (deliberate) similarity of the notations in j |8.1| with 
the ordinary, established comprehension- and enumeration notation of set theory 



that we used in { C.2 just above. We calculate first 



H.Z 

lz:Z;x':z ' x'J 

iz-.fy.Y' lx:X \ y.{f.x)/B.{f.xm;x':z' x' 
iy:Y-x':ix:X\y.{f.x)/B.{f.x)l . x'l 
ly:Y-x':{Qx:X • y.{f.x)/B.{f.x) x M) • x'] 
{Qy:Y;x:X .y.{f.x)lB.{f.x)xixl) 
{Qx-.X- M X (0 2/:y . y.{f.x))/B.{f.x)) 
{Qx:X. M X B.{f.x)/B.{f.x)) 
{Qx-.X-ixD 
X . 



"defn Z" 
"see below" 

"B = /i.y" 



This calculation is not ideal: it tries to follow the calculation in |C.2[ but it 
needs some extra steps. For the "see below" we calculate 



{Qx:X .y.[f.x)/B.{f.x)) 
{Qb:B • y.b/B.b) 
( Y.b: m -y.b ) 

1 , 



"B : 



"y is total" 



so that the denominator of the conditional comprehension can be removed. We 
use this same identity below for the second calculation, reasoning 



^I-Z 



iy-Y 
iy 
Iv 
Iv 

Y 



lx:X\y.{f.x)/B.{f.x) • f.x^^ 
Y.{Qx:X . y.{f.x)/B.{f.x) x {{f-x})} 
Y .{Qb:B . y.b/B.b X PI)} 
Y-yJ 



"defn Z" 
"see above" 
"B = Mf.X" 



That proves Conj . [CT] for the discrete-distribution case, sufficient in fact for 
this report. We now turn to proper measures. 



C.4 Refinement for proper measures 
C.4.1 Product measures and conditionals 

We begin with three measurable spaces {A, A) and {B,B) with {G,G) being 
their product. The Product Measure Theorem j4i p97] assumes a measure a and 
a function /: A^MB, that is equivalently of type A— >M, with the further 
property that for any fixed BeB the function f.{-).B is measurable on A so 
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that integrations J^{f.a.B)da are meaningful. (There are also various assump- 
tions about the measures' being finite, which we satisfy because we're using 
probability measures, thus bounded by 1.) 

Under those assumptions there is a unique product measure 7 on Q:=AxB 
satisfying!^ 

j{AxB) = / f.a.Bda for ah measurable A, i? in yl,, (50) 

J a 

given by j.G:— f.a.Gada where Ga is the "section" {b: B \ (a, 6)eG} of G 
at a. 

The function / is like a conditional probability for (the constructed) product 
measure 7, giving for each (second-coordinate measurable set) B a probability 
distribution conditioned on (first-coordinate point) a.p^ 

The Disintegration Theorem Jl, . . . /Disintegration_theorem] goes in the 
other direction. (It's a specialisation of techniques for Regular Conditional Prob- 
abilities [H §6.6], [m Prop. 10.2.8].) Here -in essence- we begin with a 7 as 



above and try to find a suitable / that satisfies ( 50 1 . To work, it requires that 
{A, A) and {B,B) be Radon spaces. 

The theorem says that for any measure 7 on the product Q — AxB there 



is an f: A^M.B_ depending on 7 such that (50) holds with a:= 7 being the 
marginal measure of 7 on its first coordinate, that is given by a. A — '^{AxB). 
Furthermore, this / is "almost uniquely determined" in the sense that any other 



/' satisfying ( 50 ) agrees with / except possibly on a subset of A with a- measure 
zero: we will write that as / /', meaningful only when /, /' are functions 
on A. 

For notational convenience, given some 7 we'll write 7 for such an / (a 
member of the equivalence class), being careful in that case to use only opera- 
tions on 7 for which the class is a congruence; similarly we write 7 for such a 

function on B. Thus we have for any 7 and a= 7 and /3= 7 the existence of 7 
and 7 such that for all measurable sets A, B 'm A,B we have 

-f.{AxB) = / j.a.B = / j.b.A . (51) 

Ja Jp 

(We note that the integrations are indeed congrucntial operations.) In the 7 
case the order of arguments is b, A (rather than a, B), so that the point always 
comes first and then the measurable set.F3 



^*Note that this product is not Cartesian: it's a product of sigma-algebras, thus the smallest 
sigma-algebra containing the Cartesian product. 

^^By analogy, in the discrete case we'd have a as a column vector on the left and, for each 
row-index a the function f.a (that is /(a, •)) would be a normalised row for that index. The 
induced "denormalised" row a. a X f.a then gives the actual row for that index a, and all those 
rows piled up together give the matrix for the product distribution 7. 

'^''For discrete measures this is saying that if we have a joint-distribution matrix 7 over 
G = AxB then it can be presented as an a-indexed collection of normalised rows ■y .a and 
also as a fe-indexed collection of normalised columns 7 .b, i.e. either way as we prefer. 



53 



As an example of the above, we give two lemmas that we'll need later. Recall 
that (bold) fi is multiplication from the probabilistic monad (i.e. it is not some 
measure fi): it "averages" a measure of measures to give a single measure again. 

The first lemma would say in discrete terms that if you have a joint dis- 
tribution matrix e over E—DxB with marginals d,/3, and you "relabel" the 
right-marginal /3 by naming its columns by the columns' values themselves, giv- 
ing a relabelled distribution then in fact C averages to S. This actually is how 
you can convert a hyper presented via an index-set (B) into a "real" hyper given 
directly as a measure of measures, i.e. with no index-set having to be defined 
separately. 

Lemma C.2 Average of conditionals Let e be a joint measure over £ — "UxB 
with marginals 6~ e and 13— e . Note that e is of type S— >M_D so that Me is 
of type MB-^WfD, and construct the measure C:=M^./3 in Z^:=WfD. 

Then we have d—fi.C ■ 
Proof: Calculate for any measurable D in V that 



The second lemma concerns "mapping" of a joint distribution's conditionals. 
It would say in discrete terms that if you have a joint distribution matrix 7 over 
G—AxB with marginals a,/3, but in fact the left-marginal a is given as a 
"pushforward measure" Mf.S via /: D^A from some other S, then you can 
make a new joint distribution e over DxB that relates S, f3 directly, i.e. so that 
6= e and /?= e and -moreover- the columns of e, distributions themselves over 
6, map (push-forward) via / to the corresponding columns of the original 7. 

Lemma C.3 Push-forward of conditionals Let 7 be a joint measure over the 

sigma-algebra Q — AxB with marginals a= 7 and 13— 7, and let there be a 
further measure 6 over D and function /: D—^A such that a—M.f.6. 

Then there is a joint measure e over E—D.^B. with marginals 5, j3 such that 
we have 7 «/3 M/o e .M 



Proof: Define e:= 70/, and observe trivially for the joint distribution e in 
M(_Ex_B) induced via the Product Measure Theorem that indeed 5— e and 



Recall that (~^) means "equal except on a set of /3- measure zero, and that (o) is functional 
composition. 




which suffices, since D was arbitrary. 



□ 




/3= e. 
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We now use the almost-unique property of 7 , calculating for arbitrary A: A 
and B: B that 

rf (M/oe).6.^d& 

= M/.( £ .5).v4d6 "composition' 

= 'e.h.{f-^.A) dh "defn M/ 

= e.{{J-^.A)y.B) "defn e 

= Jg e .d.B dd "property of e wrt e 

= // '■'^ j.{f.d).Bdd "defne 

= 7.a.i3da "chain rule, since a=M/.5' 

= 7.(Axi?) , "property of 7 wrt 7 

which is the defining property up to of 7 . □ 



C.4.2 Proof of Conj. C.l 



With the above preparation, we can now prove Conj . \C~T\ for general measures. 
We restate it using the notational conventions of this section: 



Conjecture C.l Suppose we have a measure a and two other measures S,/3 
with the properties that Mf.6=a and a=fi.f3, where /: D-^A. 

Then there is a measure C in M'^D such that S=fi.( and /3=M'^/.^. □ 

Fig. [4] gives the relevant commuting diagram. Note however that the arrows 
(!—>■) are applied to the vertices — they are not functions between the vertices. 
Here is the proof: 

1. Because a=fi.p we can construct a joint 7 over G~AxB^ with marginals 
a, P such that 7=1, the identity function on B. This follows directly 
ce M1=1.F1 



from Lem. C.2 



2. Note at this point -just for keeping things straight- that since a=/i../3 
in fact B=MA, that is elements b: B can be applied to measurable subsets 
A:AoiA. 

3. Now, as in Lem. |C.3[ given our assumption a=M.f.S, construct joint e 
over £='DxB with marginals 6, /3 and such that 7 ~^ M/o e . 



4. And now, as in Lem. 



C.2 



construct measure C:=M£ .(3. (For the types 
here, remember that /3 e MB = M^A, and £^: B^MD, and that's why 
C=M e"./? is of type W^D.) 



5. Observe that we then have S=fj,X directly from Lem. C.2 



■^^...or it can be calculated directly: define -f .b.A:=b.A and then verify that indeed we 
have 7 =fj,.f5. 
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Conjecture |C.l| establishes the existence of the upper half, a such that 5=iJ,.( and 
M2/.C=/3, given the lower half Mf.6 = a = /i./3. 



Figure 4: Conjecture C.l 



6. Finally, calculate 
M2/.C 

= MV-CMe"./?) "defnC" 

= M(M/o^).;3 "functor" 

= M7./3 "([3]) above gives M/oe r;^ 7; see below" 

= /3 . above: M7=M1=1" 

For the see below we note that both M/o e and 7 are functions on B 
(in fact of type B^B), and that in general if we have two functions 
/{1.2}: P->F with /i /a for some 7r:MP, then M/i.7r==M/2.7r.[33] 

That establishes the transitivity of entropy refinement in the general case of 
proper measures. 

D Refinement chains have suprema 



As we showed in |6.1[ the discrete hypers are not closed under suprema of chains: 
for the example we gave there, a measure was required. We show here that in 



^Reason that for any FaT we have 



M/l.TT.F 

= ^■U2^-F) '7l~-/2" 
= M/2.7r.F , 

hence M/i.vr = M/2.7r since F was arbitrary. 
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fact measures are sufficient, in other words that sup-closure is achieved. 



Definition D.l Continuous relation Say that a relation R: X^Y between 
two complete metric spaces X, Y is continuous if for every pair of convergent 
sequences {xi}i and {yi}i with Xi{R)yi for all i we have also 

lim,; Xi (i?) limi y, . 

□ 



Lemma D.l Continuous functions as continuous relations If /: X^Y is con- 
tinuous, then both / and /^^ are continuous relations. 

Proof: Immediate from Def. |D.1[ □ 

Lemma D.2 Composition of continuous relations If R:X-<r^Y and S:Y^Z 
are continuous relations between metric spaces X, Y, Z with additionally Y com- 
pact, then their composition RoS £ X-<r^Z is continuous also. 



Proof: With the assumptions of Def. D.l wrt RoS there is a sequence {yi} 



in Y such that Xi{R)yi A yi{S)zi for all i. From compactness of Y there is 
then a convergent subsequence {yj}j with limit y say; and by continuity of R, S 
separately the corresponding subsequences {zj}j in X, Z with limits i, z 

satisfy x{R)y A y{S)z so that in fact x{RoS)z. □ 

Lemma D.3 Entropy refinement is a partial order 



Proof: Its reflexivity is immediate by taking A = (M77).A5 in Def. B.l 6) 



its antisymmetry was proved in f|Bj its transitivity was proved in fJC] □ 

Lemma D.4 Continuity of entropy refinement The entropy refinement rela- 
tion (^) between hypers, as defined in Def. [Rlpl and thus in MDA'oMDA', 
is continuous in the sense of Def. iD.llF^ 

Proof: From Lem. A.l we have that fj,: WPDX~>-MDX is nonexpansive. Since 



also fi: MBX^DX is nonexpansive we have that M/x: M^DA'-^MDA' is nonex- 
pansive as well (same lemma). 

Since the entropy-refinement relation is the composition of the inverted first 
with the second, we have its continuity by Lem. |D.1| and Lem. |D.2| given that 
M^DA" is compact (and 1-bounded) because (ultimately) X is compact (and 
1-bounded). □ 

Lemma D.5 Continuity of termination refinement The termination refine- 



ment relation (<) between hypers as defined in Def. 7.2 is continuous in the 
sense of Def. ID. II 

Proof: Trivial. □ 
Corollary D.l Continuity of secure refinement The secure refinement rela- 



tion (C) between hypers as defined in Def. 7.3 is continuous in the sense of 

Def. inn 

Proof: Immediate from Lemmas |D.5[ |D.4| and |D.2| since MDA" is compact. 
□ 



For consistency with the main report we use X rather than X here. 
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Lemma D.6 Refinement chains have suprema Let {Aj}i be an (C)-chain in 
MDX; then it has a (C)-supreniuni in MDX. 

Proof: Since MDA' is compact there is an infinite subsequence converg- 
ing to A. For any A^ in our original sequence consider the tail of the infinite 
subsequence beginning beyond that point, that is {Aj}jyjg for jo correspond- 
ing to a point at i or beyond in the original sequence. Since we have A^ C A-,- 
for all j>jo we have by continuity of refinement Cor. |D.1| arranged that also 
A, C limj>j„ Aj- = A. 

Now suppose that A^ C A' for all i. That means in particular that Aj IZ A' 
for all j and thus again by continuity that A C A'. 

Hence A ~ UjA^ as required. □ 



E Iteration is monotonic for secure refinement 

It is trivial (and often assumed without comment) that iteration defined as a 
least fixed-point with respect to some partial order (C) is monotonic with re- 
spect to that same order in the iteration body: this is simply distribution of (C) 
through (C)-suprema of chains. In our case however we use termination refine- 
ment (<) for the chains producing least fixed-points, for reasons we explained 



in ^6.2 yet we are interested in the monotonicity of secure refinement (C) wrt 
their suprema. 

We show that in fact the (<)-chains can be treated as (C)-chains, so that 
the above trivial argument then applies. 

Lemma E.l Termination-refinement chains converge Let {A^ji be a (<)- 
chain in MDA". Then it converges in the Kantorovich metric. 
Proof: Recall that we are using subprobability measures, and observe that for 
any A<A' the Kantorovich distance between A and A' cannot exceed ^ 
X^A, and that difference of weights converges to zero along any (<)-chain. □ 

Lemma E.2 Termination- and refinement- suprema {^)-agree on termination- 
chains Let {AJi be a (<)-chain in MBX. Then □ ■ A^ = \J^ A^. 
Proof: We prove \_\^ A^ E Vi E Ui ^^"^ appeal to antisymmetry. 

We know already that both suprema exist, and it is trivial from (<)C(IZ) 
that \_\- Ai C Vi The other direction A; C |J^ A^ follows from continuity 
of (C) and the convergence of {Ai}i as established in Leni. E.l □ 



Lemma E.3 Iteration is monotonic with respect to secure refinement Let 
{Af be two (<)-chains in MDX with (<) -suprema A^^'^^ respectively, 
and suppose that Aj C Af for each i. Then we have also A^ C A^ 



Proof: This is now trivial, since Lem. E.2 established that [A] ' }i consid- 
ered as (C)-chains have those same A^i'^^s their (C) -suprema. □ 
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